# Hetero-NN Quick Start: A Binary Classification Task

In this tutorial, you will learn how to use Hetero-NN. It should be noted that Hetero-NN has also been upgraded to work similarly to Homo-NN, allowing for high customization of both models and datasets using the Pytorch backend. We will cover customization in a later chapter specifically for Hetero-NN.

Additionally, Hetero-NN has also improved some interfaces, such as the Interactive-layer interface, which makes the logic of its usage clearer.

In this chapter, we will provide an example of a basic binary classification task using Hetero-NN. The process of using this algorithm is consistent with other FATE algorithms: you will use the reader and transformer interfaces provided by FATE to input table data, and then input the data into the algorithm component. The component will then use the defined top/bottom model, optimizer, and loss function for training. The usage of this version is basically the same as the usage of the old version of FATE.

If you want to understand the principle of the Hetero-NN algorithm, you can refer to doc/federated_component/hetero_nn.md.

## Uploading Tabular Data

At the very beginning, we upload data to FATE. We can directly upload data using the pipeline. Here we upload two files: breast_hetero_guest.csv for the guest, and breast_hetero_host.csv for the host. Please notice that in this tutorial we are using a standalone version, if you are using a cluster version, you need to upload corresponding data on each machine. 

In [1]:
from pipeline.backend.pipeline import PipeLine  # pipeline class

# we have two party: guest, whose data with labels
#                    host, without label
# the dataset is vertically split

dense_data_guest = {"name": "breast_hetero_guest", "namespace": f"experiment"}
dense_data_host = {"name": "breast_hetero_host", "namespace": f"experiment"}

guest= 9999
host = 10000

pipeline_upload = PipeLine().set_initiator(role='guest', party_id=guest).set_roles(guest=guest, host=host)

partition = 4

# 上传一份数据
pipeline_upload.add_upload_data(file="./examples/data/breast_hetero_guest.csv",
                                table_name=dense_data_guest["name"],             # table name
                                namespace=dense_data_guest["namespace"],         # namespace
                                head=1, partition=partition)               # data info

pipeline_upload.add_upload_data(file="./examples/data/breast_hetero_host.csv",
                                table_name=dense_data_host["name"],
                                namespace=dense_data_host["namespace"],
                                head=1, partition=partition)      # data info

pipeline_upload.upload(drop=1)

 UPLOADING:||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||100.00%


[32m2022-12-19 11:28:35.529[0m | [1mINFO    [0m | [36mpipeline.utils.invoker.job_submitter[0m:[36mmonitor_job_status[0m:[36m83[0m - [1mJob id is 202212191128349034250
[0m
[32m2022-12-19 11:28:35.542[0m | [1mINFO    [0m | [36mpipeline.utils.invoker.job_submitter[0m:[36mmonitor_job_status[0m:[36m98[0m - [1m[80D[1A[KJob is still waiting, time elapse: 0:00:00[0m
[32m2022-12-19 11:28:36.557[0m | [1mINFO    [0m | [36mpipeline.utils.invoker.job_submitter[0m:[36mmonitor_job_status[0m:[36m98[0m - [1m[80D[1A[KJob is still waiting, time elapse: 0:00:01[0m
[32m2022-12-19 11:28:37.573[0m | [1mINFO    [0m | [36mpipeline.utils.invoker.job_submitter[0m:[36mmonitor_job_status[0m:[36m98[0m - [1m[80D[1A[KJob is still waiting, time elapse: 0:00:02[0m
[0mm2022-12-19 11:28:38.594[0m | [1mINFO    [0m | [36mpipeline.utils.invoker.job_submitter[0m:[36mmonitor_job_status[0m:[36m125[0m - [1m
[32m2022-12-19 11:28:38.595[0m | [1mINFO    [0m | 

 UPLOADING:||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||100.00%


[32m2022-12-19 11:28:45.106[0m | [1mINFO    [0m | [36mpipeline.utils.invoker.job_submitter[0m:[36mmonitor_job_status[0m:[36m83[0m - [1mJob id is 202212191128447632710
[0m
[32m2022-12-19 11:28:45.118[0m | [1mINFO    [0m | [36mpipeline.utils.invoker.job_submitter[0m:[36mmonitor_job_status[0m:[36m98[0m - [1m[80D[1A[KJob is still waiting, time elapse: 0:00:00[0m
[32m2022-12-19 11:28:46.133[0m | [1mINFO    [0m | [36mpipeline.utils.invoker.job_submitter[0m:[36mmonitor_job_status[0m:[36m98[0m - [1m[80D[1A[KJob is still waiting, time elapse: 0:00:01[0m
[0mm2022-12-19 11:28:47.159[0m | [1mINFO    [0m | [36mpipeline.utils.invoker.job_submitter[0m:[36mmonitor_job_status[0m:[36m125[0m - [1m
[32m2022-12-19 11:28:47.161[0m | [1mINFO    [0m | [36mpipeline.utils.invoker.job_submitter[0m:[36mmonitor_job_status[0m:[36m127[0m - [1m[80D[1A[KRunning component upload_0, time elapse: 0:00:02[0m
[32m2022-12-19 11:28:48.185[0m | [1mINFO    

The breast dataset is a binary dataset set with 30 features, and it is vertically split:
guest holds 10 fetureas and label, while host holds 20 features

In [2]:
import pandas as pd
df = pd.read_csv('../../../../examples/data/breast_hetero_guest.csv')
df

Unnamed: 0,id,y,x0,x1,x2,x3,x4,x5,x6,x7,x8,x9
0,133,1,0.254879,-1.046633,0.209656,0.074214,-0.441366,-0.377645,-0.485934,0.347072,-0.287570,-0.733474
1,273,1,-1.142928,-0.781198,-1.166747,-0.923578,0.628230,-1.021418,-1.111867,-0.959523,-0.096672,-0.121683
2,175,1,-1.451067,-1.406518,-1.456564,-1.092337,-0.708765,-1.168557,-1.305831,-1.745063,-0.499499,-0.302893
3,551,1,-0.879933,0.420589,-0.877527,-0.780484,-1.037534,-0.483880,-0.555498,-0.768581,0.433960,-0.200928
4,199,0,0.426758,0.723479,0.316885,0.287273,1.000835,0.962702,1.077099,1.053586,2.996525,0.961696
...,...,...,...,...,...,...,...,...,...,...,...,...
564,529,1,-0.583805,-1.613330,-0.605880,-0.581312,0.864944,-0.579301,-0.527672,-0.619360,-0.193738,-0.189844
565,40,0,-0.070240,0.744648,-0.141817,-0.162929,-1.006849,-0.317847,-0.305547,-0.051865,0.150849,-0.691912
566,115,1,-0.538247,0.076989,-0.587413,-0.523125,0.772888,-0.091382,-0.584763,-0.641591,-0.748637,0.081139
567,2,0,1.511870,-0.023974,1.347475,1.456285,0.527407,1.082932,0.854974,1.955000,1.152255,0.201391


In [3]:
import pandas as pd
df = pd.read_csv('../../../../examples/data/breast_hetero_host.csv')
df

Unnamed: 0,id,x0,x1,x2,x3,x4,x5,x6,x7,x8,...,x10,x11,x12,x13,x14,x15,x16,x17,x18,x19
0,133,0.449512,-1.247226,0.413178,0.303781,-0.123848,-0.184227,-0.219076,0.268537,0.015996,...,-0.337360,-0.728193,-0.442587,-0.272757,-0.608018,-0.577235,-0.501126,0.143371,-0.466431,-0.554102
1,273,-1.245485,-0.842317,-1.255026,-1.038066,-0.426301,-1.088781,-0.976392,-0.898898,0.983496,...,-0.493639,0.348620,-0.552483,-0.526877,2.253098,-0.827620,-0.780739,-0.376997,-0.310239,0.176301
2,175,-1.549664,-1.126219,-1.546652,-1.216392,-0.354424,-1.167051,-1.114873,-1.261820,-0.327193,...,-0.666881,-0.779358,-0.708418,-0.637545,0.710369,-0.976454,-1.057501,-1.913447,0.795207,-0.149751
3,551,-0.851273,0.733108,-0.843535,-0.786363,-0.049836,-0.424532,-0.509221,-0.679649,0.797298,...,-0.451772,0.453852,-0.431696,-0.494754,-1.182041,0.281228,0.084759,-0.252420,1.038575,0.351054
4,199,0.091654,0.216499,0.103839,-0.034667,0.167930,0.308132,0.366614,0.280661,0.505223,...,-0.707304,-1.026834,-0.702973,-0.460212,-0.999033,-0.531406,-0.394360,-0.728830,-0.644416,-0.688003
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
564,529,-0.584300,-1.361252,-0.582390,-0.596377,0.970677,-0.270077,-0.640169,-0.540104,-0.564504,...,-0.555357,-1.293361,-0.570305,-0.479573,0.095344,-0.779555,-0.461337,-0.618041,-0.111671,-0.590414
565,40,-0.195201,0.532980,-0.238451,-0.261342,-1.048999,-0.834452,-0.724413,-0.737944,-0.100834,...,-0.601554,-0.708235,-0.640599,-0.435790,-1.253710,-0.808059,-0.596618,-0.797283,-0.816347,-0.948996
566,115,-0.624062,0.521345,-0.635937,-0.615148,0.093918,-0.489914,-0.697043,-0.743876,-0.451325,...,-0.336999,-0.533695,-0.428726,-0.342062,0.254017,-0.022811,-0.449069,-0.662649,-0.939848,0.023110
567,2,1.579888,0.456187,1.566503,1.558884,0.942210,1.052926,1.363478,2.037231,0.939685,...,1.228676,-0.780083,0.850928,1.181336,-0.297005,0.814974,0.213076,1.424827,0.237036,0.293559


## Write the Pipeline script and execute it

After the upload is complete, we can start writing the pipeline script to submit a FATE task.

In [14]:
import torch as t
from torch import nn
from pipeline.backend.pipeline import PipeLine  # pipeline Class
from pipeline import fate_torch_hook
from pipeline.component import HeteroNN, Reader, DataTransform, Intersection  # Hetero NN Component, Data IO component, PSI component
from pipeline.interface import Data, Model # data, model for defining the work flow

### fate_torch_hook

Please be sure to execute the following fate_torch_hook function, which can modify some classes of torch, so that the torch layers, sequential, optimizer, and loss function you define in the scripts can be parsed and submitted by the pipeline. 

In [15]:
from pipeline import fate_torch_hook
t = fate_torch_hook(t)

In [16]:

guest = 9999
host = 10000
pipeline = PipeLine().set_initiator(role='guest', party_id=guest).set_roles(guest=guest, host=host)

guest_train_data = {"name": "breast_hetero_guest", "namespace": "experiment"}
host_train_data = {"name": "breast_hetero_host", "namespace": "experiment"}

pipeline = PipeLine().set_initiator(role='guest', party_id=guest).set_roles(guest=guest, host=host)

# read uploaded dataset
reader_0 = Reader(name="reader_0")
reader_0.get_party_instance(role='guest', party_id=guest).component_param(table=guest_train_data)
reader_0.get_party_instance(role='host', party_id=host).component_param(table=host_train_data)
# The transform component converts the uploaded data to the DATE standard format
data_transform_0 = DataTransform(name="data_transform_0")
data_transform_0.get_party_instance(role='guest', party_id=guest).component_param(with_label=True)
data_transform_0.get_party_instance(role='host', party_id=host).component_param(with_label=False)
# intersection
intersection_0 = Intersection(name="intersection_0")

### The Hetero NN Component

Here we initialize the Hetero-NN component. We use get_party_instance to obtain the guest component and host component respectively.  As the model architectures of the two parties differ, we must specify the model parameters for each party using the respective components.

In [17]:
hetero_nn_0 = HeteroNN(name="hetero_nn_0", epochs=2,
                       interactive_layer_lr=0.01, batch_size=-1, validation_freqs=1, task_type='classification', seed=114514)
guest_nn_0 = hetero_nn_0.get_party_instance(role='guest', party_id=guest)
host_nn_0 = hetero_nn_0.get_party_instance(role='host', party_id=host)

### Defining Guest & Host Model

In [18]:
# Guest Bottom, Top Model
guest_bottom = t.nn.Sequential(
    nn.Linear(10, 2),
    nn.ReLU()
)
guest_top = t.nn.Sequential(
    nn.Linear(2, 1),
    nn.Sigmoid()
)

# Host Bottom Model
host_bottom = t.nn.Sequential(
    nn.Linear(20, 2),
    nn.ReLU()
)

# After using fate_torch_hook, nn module can use InteractiveLayer, you can view the structure of Interactive layer with print
interactive_layer = t.nn.InteractiveLayer(out_dim=2, guest_dim=2, host_dim=2, host_num=1)
print(interactive_layer)

guest_nn_0.add_top_model(guest_top)
guest_nn_0.add_bottom_model(guest_bottom)
host_nn_0.add_bottom_model(host_bottom)

optimizer = t.optim.Adam(lr=0.01) # Notice! After fate_torch_hook, the optimizer can be initialized without model parameter
loss = t.nn.BCELoss()

hetero_nn_0.set_interactive_layer(interactive_layer)
hetero_nn_0.compile(optimizer=optimizer, loss=loss)

InteractiveLayer(
  (activation): ReLU()
  (guest_model): Linear(in_features=2, out_features=2, bias=True)
  (host_model): ModuleList(
    (0): Linear(in_features=2, out_features=2, bias=True)
  )
  (act_seq): Sequential(
    (0): ReLU()
  )
)


In [19]:
pipeline.add_component(reader_0)
pipeline.add_component(data_transform_0, data=Data(data=reader_0.output.data))
pipeline.add_component(intersection_0, data=Data(data=data_transform_0.output.data))
pipeline.add_component(hetero_nn_0, data=Data(train_data=intersection_0.output.data))
pipeline.compile()

<pipeline.backend.pipeline.PipeLine at 0x7f0613acc0d0>

In [20]:
pipeline.fit()

[32m2022-12-19 11:59:51.084[0m | [1mINFO    [0m | [36mpipeline.utils.invoker.job_submitter[0m:[36mmonitor_job_status[0m:[36m83[0m - [1mJob id is 202212191159500270390
[0m
[32m2022-12-19 11:59:51.107[0m | [1mINFO    [0m | [36mpipeline.utils.invoker.job_submitter[0m:[36mmonitor_job_status[0m:[36m98[0m - [1m[80D[1A[KJob is still waiting, time elapse: 0:00:00[0m
[32m2022-12-19 11:59:52.127[0m | [1mINFO    [0m | [36mpipeline.utils.invoker.job_submitter[0m:[36mmonitor_job_status[0m:[36m98[0m - [1m[80D[1A[KJob is still waiting, time elapse: 0:00:01[0m
[0mm2022-12-19 11:59:53.155[0m | [1mINFO    [0m | [36mpipeline.utils.invoker.job_submitter[0m:[36mmonitor_job_status[0m:[36m125[0m - [1m
[32m2022-12-19 11:59:53.157[0m | [1mINFO    [0m | [36mpipeline.utils.invoker.job_submitter[0m:[36mmonitor_job_status[0m:[36m127[0m - [1m[80D[1A[KRunning component reader_0, time elapse: 0:00:02[0m
[32m2022-12-19 11:59:54.195[0m | [1mINFO    

## Get Component Output

In [21]:
# get predict scores
pipeline.get_component('hetero_nn_0').get_output_data()

Unnamed: 0,id,label,predict_result,predict_score,predict_detail,type
0,0,0.0,0,0.13979218900203705,"{'0': 0.860207810997963, '1': 0.13979218900203...",train
1,1,0.0,0,0.19935783743858337,"{'0': 0.8006421625614166, '1': 0.1993578374385...",train
2,2,0.0,0,0.2489972561597824,"{'0': 0.7510027438402176, '1': 0.2489972561597...",train
3,3,0.0,0,0.25491416454315186,"{'0': 0.7450858354568481, '1': 0.2549141645431...",train
4,4,1.0,0,0.2584167718887329,"{'0': 0.7415832281112671, '1': 0.2584167718887...",train
...,...,...,...,...,...,...
564,564,0.0,0,0.19034752249717712,"{'0': 0.8096524775028229, '1': 0.1903475224971...",train
565,565,1.0,0,0.261306494474411,"{'0': 0.738693505525589, '1': 0.261306494474411}",train
566,566,1.0,0,0.26077690720558167,"{'0': 0.7392230927944183, '1': 0.2607769072055...",train
567,567,1.0,0,0.2625167667865753,"{'0': 0.7374832332134247, '1': 0.2625167667865...",train


In [23]:
# get summary
pipeline.get_component('hetero_nn_0').get_summary()

{'best_iteration': -1,
 'history_loss': [0.9929580092430115, 0.9658427238464355],
 'is_converged': False,
 'validation_metrics': {'train': {'auc': [0.8850615717985308,
    0.9316368056656624],
   'ks': [0.6326568363194334, 0.7479123724961683]}}}

So far, we have gained a basic understanding of Hetero-NN and have utilized it to perform basic modeling tasks. Hetero-NN also supports the use of more complex models, datasets. For further information, refer to the additional tutorials provided.