Overall Design

1. Logical Architecture

DSL defined jobs
Top-down vertical subtask flow scheduling, multi-participant joint subtask coordination
Independent isolated task execution work processes
Support for multiple types and versions of components
Computational abstraction API
Storage abstraction API
Cross-party transfer abstraction API

2. Service Architecture

2.1 FATE

2.2 FATE Flow

3. Scheduling Architecture

3.1 A new scheduling architecture based on shared-state

Stripping state (resources, jobs) and managers (schedulers, resource managers)
Resource state and job state are persisted in MySQL and shared globally to provide reliable transactional operations
Improve the high availability and scalability of managed services
Jobs can be intervened to support restart, rerun, parallel control, resource isolation, etc.

3.2 State-Driven Scheduling

Resource coordination
Pull up the child process Executor to run the component
Executor reports state to local Server and also to scheduler
Multi-party task state calculation of federal task state
Upstream and downstream task states compute job states

4. Multiparty Resource Coordination

The total resource size of each engine is configured through the configuration file, and the system is subsequently interfaced
The cores_per_node in the total resource size indicates the number of cpu cores per compute node, and nodes indicates the number of compute nodes.
FATEFlow server reads the resource size configuration from the configuration file when it starts and registers the update to the database
The resources are requested in Job dimension, and take effect when Job Conf is submitted, formula: task_parallelism*task_cores
See separate section of the documentation for details

5. Data Flow Tracking

Definition
- metric type: metric type, such as auc, loss, ks, etc.
- metric namespace: custom metric namespace, e.g. train, predict
- metric name: custom metric name, e.g. auc0, hetero_lr_auc0
- metric data: metric data in key-value form
- metric meta: metric meta information in key-value form, support flexible drawing
API
- log_metric_data(metric_namespace, metric_name, metrics)
- set_metric_meta(metric_namespace, metric_name, metric_meta)
- get_metric_data(metric_namespace, metric_name)
- get_metric_meta(metric_namespace, metric_name)

6. Realtime Monitoring

Job process survivability detection
Job timeout detection
Resource recovery detection
Base engine session timeout detection

7. Task Component Registry

8. Multi-Party Federated Model Registry

Using Google Protocol Buffer as the model storage protocol, using cross-language sharing, each algorithmic model consists of two parts: ModelParam & ModelMeta
A Pipeline generates a series of algorithmic models
The model named Pipeline stores Pipeline modeling DSL and online inference DSL
Under federal learning, model consistency needs to be guaranteed for all participants, i.e., model binding
model_key is the model identifier defined by the user when submitting the task
The model IDs of the federated parties are the party identification information role, party_id, plus model_key
The model version of the federated parties must be unique and consistent, and FATE-Flow directly sets it to job_id

{: style="height:400px;width:450px"}

{: style="height:400px;width:800px"}

9. Data Access

Upload.
- External storage is imported directly to FATE Storage, creating a new DTable
- When the job runs, Reader reads directly from Storage
Table Bind.
- Key the external storage address to a new DTable in FATE
- When the job is running, Reader reads data from external storage via Meta and transfers it to FATE Storage
- Connecting to the Big Data ecosystem: HDFS, Hive/MySQL

fate_flow.md 4.5 KB

Historia Czysty

Overall Design

1. Logical Architecture

2. Service Architecture

2.1 FATE

2.2 FATE Flow

3. Scheduling Architecture

3.1 A new scheduling architecture based on shared-state

3.2 State-Driven Scheduling

4. Multiparty Resource Coordination

5. Data Flow Tracking

6. Realtime Monitoring

7. Task Component Registry

8. Multi-Party Federated Model Registry

9. Data Access

10. Multi-Party Collaboration Authority Management

fate_flow.md 4.5 KB Historia Czysty

Overall Design

1. Logical Architecture

2. Service Architecture

2.1 FATE

2.2 FATE Flow

3. Scheduling Architecture

3.1 A new scheduling architecture based on shared-state

3.2 State-Driven Scheduling

4. Multiparty Resource Coordination

5. Data Flow Tracking

6. Realtime Monitoring

7. Task Component Registry

8. Multi-Party Federated Model Registry

9. Data Access

10. Multi-Party Collaboration Authority Management

fate_flow.md 4.5 KB

Historia Czysty