fate_flow_model_migration.md 11 KB

Inter-cluster Model Migration

The model migration function makes it possible to copy the model file to a cluster with a different party_id and still have it available.

  1. the cluster of any of the model generation participants is redeployed and the party_id of the cluster is changed after the deployment, e.g. the source participant is arbiter-10000#guest-9999#host-10000, changed to arbiter-10000#guest-99#host-10000
  2. Any one or more of the participants will copy the model file from the source cluster to the target cluster, which needs to be used in the target cluster

Basics.

  1. In the above two scenarios, the participant party_id of the model changes, such as arbiter-10000#guest-9999#host-10000 -> arbiter-10000#guest-99#host-10000, or arbiter-10000#guest -9999#host-10000 -> arbiter-100#guest-99#host-100
  2. the model's participant party_id changes, so model_id and the model file involving party_id need to be changed
  3. The overall process has three steps: copy and transfer the original model file, execute the model migration task on the original model file, and import the new model generated by the model migration task.
  4. where execute model migration task on the original model file is actually a temporary copy of the original model file at the execution, and then modify model_id and the contents of the model file involving party_id according to the configuration, in order to adapt to the new participant party_id.
  5. All the above steps need to be performed on all new participants, even if the party_id of one of the target participants has not changed.
  6. the new participant cluster version needs to be greater than or equal to 1.5.1.

The migration process is as follows.

Transfer the model file

Please package and transfer the model files (including the directory named by model id) generated by the machine where the source participant fate flow service is located to the machine where the target participant fate flow is located, and please transfer the model files to a fixed directory as follows.

$FATE_PROJECT_BASE/model_local_cache

Instructions:

  1. just transfer the folder, if you do the transfer by compressing and packing, please extract the model files to the directory where the model is located after the transfer.
  2. Please transfer the model files one by one according to the source participants.

Preparation work before migration

Instructions

  1. refer to fate flow client to install the client fate-client which supports model migration, only fate 1.5.1 and above are supported.

Execute the migration task

Description

  1. Execute the migration task by replacing the source model file with model_id, model_version and the contents of the model involving role and party_id according to the migration task configuration file

  2. The cluster submitting the task must complete the above migration preparation

1. Modify the configuration file

Modify the configuration file of the migration task in the new participant (machine) according to the actual situation, as follows for the migration task example configuration file migrate_model.json

{
  "job_parameters": {
    "federated_mode": "SINGLE"
  },
  "role": {
    "guest": [9999],
    "arbiter": [10000],
    "host": [10000]
  },
  "migrate_initiator": {
    "role": "guest",
    "party_id": 99
  },
  "migrate_role": {
    "guest": [99],
    "arbiter": [100],
    "host": [100]
  },
  "execute_party": {
    "guest": [9999],
    "arbiter": [10000],
    "host": [10000]
  },
  "model_id": "arbiter-10000#guest-9999#host-10000#model",
  "model_version": "202006171904247702041",
  "unify_model_version": "202901_0001"
}

Please save the above configuration content to a location in the server for modification.

The following are explanatory notes for the parameters in this configuration.

  1. job_parameters: The federated_mode in this parameter has two optional parameters, which are MULTIPLE and SINGLE. If set to SINGLE, the migration job will be executed only in the party that submitted the migration job, then the job needs to be submitted in all new participants separately; if set to MULTIPLE, the job will be distributed to the participants specified in execute_party to execute the job, only the new The task will be distributed to the participant specified in execute_party, and only needs to be submitted in the new participant as migrate_initiator.
  2. role: This parameter fills in the role of the participant that generated the original model and its corresponding party_id information.
  3. migrate_initiator: This parameter is used to specify the task initiator information of the migrated model, and the initiator's role and party_id should be specified respectively.
  4. migrate_role: This parameter is used to specify the role and party_id information of the migrated model.
  5. execute_party: This parameter is used to specify the role and party_id information of the party_id that needs to execute the migration, which is the source cluster party_id.
  6. model_id: This parameter is used to specify the model_id of the original model to be migrated.
  7. model_version: This parameter is used to specify the model_version of the original model that needs to be migrated.
  8. unify_model_version: This parameter is not required, it is used to specify the model_version of the new model. If this parameter is not provided, the new model will take the job_id of the migrated job as its new model_version.

Examples of the above configuration files are.

  1. the source model has guest: 9999, host: 10000, arbiter: 10000, migrate the model to have guest: 99, host: 100, arbiter: 100 as participants, and guest: 99 as the new initiator
  2. federated_mode: SINGLE means that each migration task will be executed only in the cluster where the task is submitted, then the task needs to be submitted in 99 and 100 respectively.
  3. for example, if the task is executed at 99, then execute_party is configured as "guest": [9999].
  4. For example, if you execute at 100, then execute_party is configured as "arbiter": [10000], "host": [10000]

2. Submit migration tasks (separate operations in all target clusters)

Migration tasks need to be committed using fate-client. A sample execution command is as follows.

flow model migrate -c $FATE_FLOW_BASE/examples/model/migrate_model.json

3. Task execution results

The following is the content of the configuration file for the actual migration task.

{
  "job_parameters": {
    "federated_mode": "SINGLE"
  },
  "role": {
    "guest": [9999],
    "host": [10000]
  },
  "migrate_initiator": {
    "role": "guest",
    "party_id": 99
  },
  "migrate_role": {
    "guest": [99],
    "host": [100]
  },
  "execute_party": {
    "guest": [9999],
    "host": [10000]
  },
  "model_id": "guest-9999#host-10000#model",
  "model_version": "202010291539339602784",
  "unify_model_version": "fate_migration"
}

What this task achieves is to migrate the model with model_id of guest-9999#host-10000#model and model_version of 202010291539339602784 from a cluster with party_id of 9999 (guest) and 10000 (host) to a new model that fits the party_id of 99 (guest) and 100 (host) clusters

The following is the result of a successful migration.

{
    "data": {
        "detail": {
            "guest": {
                "9999": {
                    "retcode": 0,
                    "retmsg": "Migrating model successfully. the configuration of model has been modified automatically. new model id is: guest-99#host-100#model, Model files can be found at '/data/projects/fate/temp/fate_flow/guest#99#guest-99#host-100#model_fate_migration.zip'.zip. migration.zip'."
                }
            },
            "host": {
                "10000": {
                    "retcode": 0,
                    "retmsg": "Migrating model successfully. The configuration of model has been modified automatically, Model files can be found at '/data/projects/fate/temp/fate_flow/host#100#guest-99#host-100#model_fate_migration.zip'.zip. migration.zip'."
                }
            }
        },
        "guest": {
            "9999": 0
        },
        "host": {
            "10000": 0
        }
    },
    "jobId": "202010292152299793981",
    "retcode": 0,
    "retmsg": "success"
}

After the task is successfully executed, a copy of the migrated model zip file is generated in each of the executor's machines, and the path to this file can be obtained in the returned results. For example, the path of the post-migration model file for 9999 (guest) is: /data/projects/fate/temp/fate_flow/guest#99#guest-99#host-100#model_fate_migration.zip and for 10000 (host) The model file path is: /data/projects/fate/temp/fate_flow/host#100#guest-99#host-100#model_fate_migration.zip. The new model_id can be obtained from the return as well as the model_version.

4. Transferring files and importing (separate operation in all target clusters)

After the migration task is successful, please manually transfer the newly generated model zip file to the fate flow machine of the target cluster. For example, the new model zip file generated by 9999 (guest) in point 3 needs to be transferred to the 99 (guest) machine. The zip file can be placed anywhere on the corresponding machine. Next, you need to configure the model import task, see import_model.json for the configuration file examples/import_model.json) (this configuration file is included in the zip file, please modify it according to the actual situation, do not use it directly).

The following is an example of the configuration file for importing the migrated model in guest (99).

{
  "role": "guest",
  "party_id": 99,
  "model_id": "guest-99#host-100#model",
  "model_version": "202010292152299793981",
  "file": "/data/projects/fate/python/temp/guest#99#guest-99#host-100#202010292152299793981.zip"
}

Please fill in the role role, the current party party_id, the new model_id and model_version of the migrated model, and the path to the zip file of the migrated model according to the actual situation.

The following is a sample command to submit an imported model using fate-client.

flow model import -c $FATE_FLOW_BASE/examples/model/import_model.json

The import is considered successful when it returns the following.

{
  "data": {
    "job_id": "202208261102212849780",
    "model_id": "arbiter-10000#guest-9999#host-10000#model",
    "model_version": "foobar",
    "party_id": "9999",
    "role": "guest"
  },
  "retcode": 0,
  "retmsg": "success"
}

The migration task is now complete and the user can submit the task with the new model_id and model_version to perform prediction tasks with the migrated model.