model_merge.md 9.2 KB

Guide to Merging FATE Model

1. Overview

Trained FATE models from federated job of host and guest may be merged locally and reused in other frameworks. Currently, FATE supports merging and exporting models from HeteroLR, HeteroSSHELR and HeteroSecureBoost into PMML or sklearn Logistic Regression Model/lightGBM objects. Below we walk through the general process of merging a Hetero Logistic Regression model using FATE-Flow Client command line tool.

2. Install FATE-Flow Client

flow command line tool is distributed along with FATE-Client. Once FATE-Client installed, one can find a cmd enterpoint named flow:

pip install fate_client
flow --help
Usage: flow [OPTIONS] COMMAND [ARGS]...

  Fate Flow Client

Options:
  -h, --help  Show this message and exit.

Commands:
  checkpoint  Checkpoint Operations
  component   Component Operations
  data        Data Operations
  init        Flow CLI Init Command
  job         Job Operations
  key         Key Operations
  model       Model Operations
  privilege   Privilege Operations
  provider    Component Provider Operations
  queue       Queue Operations
  resource    Resource Manager
  server      FATE Flow Server Operations
  service     FATE Flow External Service Operations
  table       Table Operations
  tag         Tag Operations
  task        Task Operations
  template    Template Operations
  test        FATE Flow Test Operations
  tracking    Component Operations

To use FATE-Flow Client in terminal, we need to first specify which FATE Flow Service to connect to. Assume we have a FATE Flow Service at 172.15.0.1:9380, then execute the following command to initialize FATE-Flow Client:

flow init -c /data/projects/fate/conf/service_conf.yaml
flow init --ip 172.15.0.1 --port 9380

In the following steps, we assume that host is the role executes merging action.

3. Prepare Model

To obtain guest's trained Hetero Logistic Regression, we need to first make host load exported model. For models trained in standalone mode, you may skip this step.

3.1 Export Model(Cluster mode)

Below is an example model exportation configuration, adopted from here:

{
    "role": "guest",
    "party_id": 9999,
    "model_id": "arbiter-10000#guest-9999#host-10000#model",
    "model_version": "202208291446341887230",
    "output_path": "/data/projects/fate/examples/export_model"
}

Run the following command to export specified model:

flow model export -c export_model.json

If model exportation succeeds, flow will return success message like this:

{
    "retcode": 0,
    "file": "/data/projects/fate/examples/model_export/guest#9999#arbiter-10000#guest-9999#host-10000#model_202208291446341887230.zip",
    "retmsg": "download successfully, please check /data/projects/fate/examples/model_export/guest#9999#arbiter-10000#guest-9999#host-10000#model_202208291446341887230.zip"
}

3.2 Import Model(Cluster mode)

If the FATE model is trained in distributed mode, where guest and host have FATE each installed in different machines, exported model needs to be imported/registered on merging role's system first before merging models.

Here is an example configuration for model importation adopted from here, assuming we have put the exported guest's model from above step to directory "/data/projects/fate/temp" on host's machine:

{
  "role": "guest",
  "party_id": 9999,
  "model_id": "arbiter-10000#guest-9999#host-10000#model",
  "model_version": "202208291446341887230",
  "file": "/data/projects/fate/temp/guest#9999#arbiter-10000#guest-9999#host-10000#model_202208291446341887230.zip"
}

Run the following command to import specified model:

flow model import -c import_model.json

If model importation succeeds, flow will return success message like this:

{
    "data": {
        "job_id": "202208291521250963160",
        "model_id": "arbiter-10000#guest-9999#host-10000#model",
        "model_version": "202208291446341887230",
        "party_id": "9999",
        "role": "guest"
    },
    "retcode": 0,
    "retmsg": "success"
}

4. Merge Model

Hetero Logistic Regression Example

Once model is ready, run the following command to merge models into a sklearn Logistic Regression model:

flow component hetero-model-merge --model-id arbiter-10000#guest-9999#host-10000#model --model-version 202208291446341887230 --guest-party-id 9999 --host-party-ids 10000 --component-name hetero_lr_0 --model-type lr --output-format sklearn --target-name y --include-guest-coef --output-path model_merge_sklearn

We may then load and predict with the obtained sklearn model:

import base64
import json
import pandas
import pickle

filename = "model_merge_sklearn"
test_data_h = pandas.read_csv("/data/projects/fate/examples/data/breast_hetero_host.csv",
                                    header=0, index_col="id")
test_data_g = pandas.read_csv("/data/projects/fate/examples/data/breast_hetero_guest.csv",
                                      header=0, index_col="id")
test_data_g = test_data_g.loc[:, test_data_g.columns != 'y']
test_data = pandas.concat((test_data_g, test_data_h), axis=1)

with open(filename, "r") as f:
    loaded_model = pickle.loads(base64.b64decode(json.load(f)))
    predict_score = loaded_model.predict_proba(test_data)

HeteroSecureBoost Example

If the federated model is a HeteroSecureBoost Model, run this command to merge model and export it in LightGBM format.

flow component hetero-model-merge --model-id arbiter-10000#guest-9999#host-10000#model --model-version 202208291446341887230 
--guest-party-id 9999 --host-party-ids 10000 --component-name hetero_secureboost_0 --model-type secureboost --output-format lgb --target-name y --output-path ./lgb.txt

If you want a model in PMML format, use this command:

flow component hetero-model-merge --model-id arbiter-10000#guest-9999#host-10000#model --model-version 202208291446341887230 
--guest-party-id 9999 --host-party-ids 10000 --component-name hetero_secureboost_0 --model-type secureboost --output-format pmml --target-name y --output-path ./lgb.txt

Please notice that the model merge function offers a parameter 'host_rename'. Host features will be renamed by adding a sitename suffix to the original feature name once this option is on:

Host party id is 9998, sitename is host_9998, origin feature is x1

x1 -> x1, default setting
x1 -> x1_host_9998, when enable host_rename

Use this command to rename host names when output

flow component hetero-model-merge --model-id arbiter-10000#guest-9999#host-10000#model --model-version 202208291446341887230 --host-rename
--guest-party-id 9999 --host-party-ids 10000 --component-name hetero_secureboost_0 --model-type secureboost --output-format lgb --target-name y --output-path ./lgb.txt

We may then load and predict with the obtained lightgbm model. Remember that to get the correct predicted result, the order of features in predicted data should be in the same order as the merged tree model.

import json
import pandas
import lightgbm as lgb

filename = "./lgb.txt"
test_data_h = pandas.read_csv("/data/projects/fate/examples/data/breast_hetero_host.csv",
                                    header=0, index_col="id")
test_data_g = pandas.read_csv("/data/projects/fate/examples/data/breast_hetero_guest.csv",
                                      header=0, index_col="id")
test_data_g = test_data_g.loc[:, test_data_g.columns != 'y']
test_data = pandas.concat((test_data_g, test_data_h), axis=1)

bst = lgb.Booster(model_str=open(filename, 'r').read())

pred_rs = bst.predict(test_data)

For more information on command options, please check help menu:

flow component hetero-model-merge --help
Usage: flow component hetero-model-merge [OPTIONS]

  - DESCRIPTION:
      Merge a hetero model.

  - USAGE:
      flow component hetero-model-merge --model-id guest-9999#host-9998#model --model-version 202208241838502253290 --guest-party-id 9999 --host-party-ids 9998,9997 --component-name hetero_secure_boost_0 --model-type secureboost --output-format pmml --target-name y --no-host-rename --no-include-guest-coef --output-path model.xml

Options:
  --model-id TEXT                 Model id.  [required]
  --model-version TEXT            Model version.  [required]
  -gid, --guest-party-id TEXT     A valid party id.  [required]
  -hids, --host-party-ids TEXT    Multiple party ids, use a comma to separate
                                  each one.  [required]

  -cpn, --component-name TEXT     A valid component name.  [required]
  --model-type TEXT               [required]
  --output-format TEXT            [required]
  --target-name TEXT
  --host-rename / --no-host-rename
  --include-guest-coef / --no-include-guest-coef
  -o, --output-path PATH          User specifies output directory path.
                                  [required]

  -h, --help                      Show this message and exit.