Skip to content

Commit e7540d0

Browse files
authored
TensorFlow support for inferentia gen script (triton-inference-server#95)
* Update the gen script to support TF models * Document the TensorFlow support for inferentia * Address review comments * Improve script interface and doc * Address review comments * Use choices in arg parser * Remove lower
1 parent a6147e8 commit e7540d0

File tree

2 files changed

+376
-89
lines changed

2 files changed

+376
-89
lines changed

inferentia/README.md

Lines changed: 67 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -90,24 +90,37 @@ Please use the `-h` or `--help` options to learn about more configurable options
9090

9191
## Setting up the Inferentia model
9292

93-
Currently, we only support TorchScript models traced by [PyTorch-Neuron trace python API](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/neuron-guide/neuron-frameworks/pytorch-neuron/api-compilation-python-api.html) for execution on Inferentia.
94-
Once the TorchScript model supporting Inferentia is obtained, use the [gen_triton_model.py](https://github.com/triton-inference-server/python_backend/blob/main/inferentia/scripts/gen_triton_model.py) script to generate triton python model directory.
93+
Currently, we only support [PyTorch](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/neuron-guide/neuron-frameworks/pytorch-neuron/index.html)
94+
and [TensorFlow](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/neuron-guide/neuron-frameworks/tensorflow-neuron/index.html)
95+
workflows for execution on inferentia.
9596

96-
An example invocation for the `gen_triton_model.py` can look like:
97+
### PyTorch
98+
99+
For PyTorch, we support models traced by [PyTorch-Neuron trace python API](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/neuron-guide/neuron-frameworks/pytorch-neuron/api-compilation-python-api.html)
100+
for execution on Inferentia.
101+
Once the TorchScript model supporting Inferentia is obtained, use the
102+
[gen_triton_model.py](scripts/gen_triton_model.py) script to generate
103+
triton python model directory.
104+
105+
An example invocation for the `gen_triton_model.py` for PyTorch model can look like:
97106

98107
```
99-
$python3 inferentia/scripts/gen_triton_model.py --triton_input INPUT__0,INT64,4x384 INPUT__1,INT64,4x384 INPUT__2,INT64,4x384 --triton_output OUTPUT__0,INT64,4x384 OUTPUT__1,INT64,4x384 --compiled_model /home/ubuntu/bert_large_mlperf_neuron_hack_bs1_dynamic.pt --neuron_core_range 0:3 --triton_model_dir bert-large-mlperf-bs1x4
108+
$python3 inferentia/scripts/gen_triton_model.py --model_type pytorch --triton_input INPUT__0,INT64,4x384 INPUT__1,INT64,4x384 INPUT__2,INT64,4x384 --triton_output OUTPUT__0,INT64,4x384 OUTPUT__1,INT64,4x384 --compiled_model /home/ubuntu/bert_large_mlperf_neuron_hack_bs1_dynamic.pt --neuron_core_range 0:3 --triton_model_dir bert-large-mlperf-bs1x4
100109
```
101110

102-
NOTE: Due to the absence of names for inputs and outputs in a
103-
TorchScript model, the name of tensor of both the inputs and
104-
outputs provided to the above script must follow a specific naming
105-
convention i.e. `<name>__<index>`. Where `<name>` can be any
106-
string and `<index>` refers to the position of the corresponding
107-
input/output. This means if there are two inputs and two outputs
108-
they must be named as: "INPUT__0", "INPUT__1" and "OUTPUT__0",
109-
"OUTPUT__1" such that "INPUT__0" refers to first input and
110-
INPUT__1 refers to the second input, etc.
111+
In order for the script to treat the compiled model as TorchScript
112+
model, `--model_type pytorch` needs to be provided.
113+
114+
NOTE: Due to the absence of metadata for inputs and outputs in a
115+
TorchScript model - name, datatype and shape of tensor of
116+
both the inputs and outputs must be provided to the above script
117+
and the name must follow a specific naming convention i.e.
118+
`<name>__<index>`. Where `<name>` can be any string and `<index>`
119+
refers to the position of the corresponding input/output. This
120+
means if there are two inputs and two outputs they must be named
121+
as: "INPUT__0", "INPUT__1" and "OUTPUT__0", "OUTPUT__1" such
122+
that "INPUT__0" refers to first input and INPUT__1 refers to the
123+
second input, etc.
111124

112125
Additionally, `--neuron_core_range` specifies the neuron cores to
113126
be used while serving this models. Currently, only
@@ -122,7 +135,45 @@ loaded on cores 2-3. To best engage inferentia device, try setting
122135
the number of neuron cores to be a proper multiple of the instance
123136
count.
124137

125-
The invocation should create a triton model directory with following
138+
### TensorFlow
139+
For TensorFlow, the model must be compiled for AWS Neuron. See
140+
[AWS Neuron TensorFlow](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/neuron-guide/neuron-frameworks/tensorflow-neuron/tutorials/index.html
141+
tutorials to learn how to get a compiled model that uses Neuron
142+
cores. Currently, the code is tested only on `tensorflow==1.15`.
143+
144+
Once the compiled model is obtained use [gen_triton_model.py](scripts/gen_triton_model.py)
145+
script to generate triton python model directory.
146+
147+
An example invocation for the `gen_triton_model.py` for TensorFlow model can look like:
148+
149+
```
150+
$python3 gen_triton_model.py --model_type tensorflow --compiled_model /home/ubuntu/inferentia-poc-2.0/scripts-rn50-tf-native/resnet50_mlperf_opt_fp16_compiled_b5_nc1/1 --neuron_core_range 0:3 --triton_model_dir rn50-1neuroncores-bs1x1
151+
```
152+
153+
NOTE: Unlike TorchScript model, TensorFlow SavedModel stores sufficient
154+
metadata to detect the name, datatype and shape of the input and output
155+
tensors for the model. By default, the script will assume the compiled
156+
model to be torchscript. In order for it to treat the compiled model
157+
as TF savedmodel, `--model_type tensorflow` needs to be provided.
158+
The input and output details are read from the model itself. The user
159+
must have [`tensorflow`](https://www.tensorflow.org/install/pip) python
160+
module installed in order to use this script for tensorflow models.
161+
162+
Similar to PyTorch, `--neuron_core_range` and `--triton_model_instance_count`
163+
can be used to specify the neuron core range and number of triton model
164+
instances. However, the neuron core indices don't point to a specific
165+
neuron core in the chip. For TensorFlow, we use deprecated feature of
166+
`NEURONCORE_GROUP_SIZES` to load model. The model in this case will be loaded on
167+
next available Neuron cores and not specific ones. See
168+
[Parallel Execution using NEURONCORE_GROUP_SIZES](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/neuron-guide/appnotes/perf/parallel-ncgs.html?highlight=NEURONCORE_GROUP_SIZES)
169+
for more information.
170+
171+
Please use the `-h` or `--help` options in `gen_triton_model.py` to
172+
learn about more configurable options.
173+
174+
## Serving Inferentia model in Triton
175+
176+
The `gen_triton_model.py` should create a triton model directory with following
126177
structutre:
127178

128179
```
@@ -139,7 +190,7 @@ Look at the usage message of the script to understand each option.
139190
The script will generate a model directory with the user-provided
140191
name. Move that model directory to Triton's model repository.
141192
Ensure the compiled model path provided to the script points to
142-
a valid torchscript file.
193+
a valid torchscript file or tensorflow savedmodel.
143194

144195
Now, the server can be launched with the model as below:
145196

@@ -151,4 +202,4 @@ Note:
151202
1. The `config.pbtxt` and `model.py` should be treated as
152203
starting point. The users can customize these files as per
153204
their need.
154-
2. Triton Inferentia currently only works with **single** model.
205+
2. Triton Inferentia is currently tested with a **single** model.

0 commit comments

Comments
 (0)