You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
* Update the gen script to support TF models
* Document the TensorFlow support for inferentia
* Address review comments
* Improve script interface and doc
* Address review comments
* Use choices in arg parser
* Remove lower
Copy file name to clipboardExpand all lines: inferentia/README.md
+67-16Lines changed: 67 additions & 16 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -90,24 +90,37 @@ Please use the `-h` or `--help` options to learn about more configurable options
90
90
91
91
## Setting up the Inferentia model
92
92
93
-
Currently, we only support TorchScript models traced by [PyTorch-Neuron trace python API](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/neuron-guide/neuron-frameworks/pytorch-neuron/api-compilation-python-api.html) for execution on Inferentia.
94
-
Once the TorchScript model supporting Inferentia is obtained, use the [gen_triton_model.py](https://github.com/triton-inference-server/python_backend/blob/main/inferentia/scripts/gen_triton_model.py) script to generate triton python model directory.
93
+
Currently, we only support [PyTorch](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/neuron-guide/neuron-frameworks/pytorch-neuron/index.html)
94
+
and [TensorFlow](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/neuron-guide/neuron-frameworks/tensorflow-neuron/index.html)
95
+
workflows for execution on inferentia.
95
96
96
-
An example invocation for the `gen_triton_model.py` can look like:
97
+
### PyTorch
98
+
99
+
For PyTorch, we support models traced by [PyTorch-Neuron trace python API](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/neuron-guide/neuron-frameworks/pytorch-neuron/api-compilation-python-api.html)
100
+
for execution on Inferentia.
101
+
Once the TorchScript model supporting Inferentia is obtained, use the
102
+
[gen_triton_model.py](scripts/gen_triton_model.py) script to generate
103
+
triton python model directory.
104
+
105
+
An example invocation for the `gen_triton_model.py` for PyTorch model can look like:
metadata to detect the name, datatype and shape of the input and output
155
+
tensors for the model. By default, the script will assume the compiled
156
+
model to be torchscript. In order for it to treat the compiled model
157
+
as TF savedmodel, `--model_type tensorflow` needs to be provided.
158
+
The input and output details are read from the model itself. The user
159
+
must have [`tensorflow`](https://www.tensorflow.org/install/pip) python
160
+
module installed in order to use this script for tensorflow models.
161
+
162
+
Similar to PyTorch, `--neuron_core_range` and `--triton_model_instance_count`
163
+
can be used to specify the neuron core range and number of triton model
164
+
instances. However, the neuron core indices don't point to a specific
165
+
neuron core in the chip. For TensorFlow, we use deprecated feature of
166
+
`NEURONCORE_GROUP_SIZES` to load model. The model in this case will be loaded on
167
+
next available Neuron cores and not specific ones. See
168
+
[Parallel Execution using NEURONCORE_GROUP_SIZES](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/neuron-guide/appnotes/perf/parallel-ncgs.html?highlight=NEURONCORE_GROUP_SIZES)
169
+
for more information.
170
+
171
+
Please use the `-h` or `--help` options in `gen_triton_model.py` to
172
+
learn about more configurable options.
173
+
174
+
## Serving Inferentia model in Triton
175
+
176
+
The `gen_triton_model.py` should create a triton model directory with following
126
177
structutre:
127
178
128
179
```
@@ -139,7 +190,7 @@ Look at the usage message of the script to understand each option.
139
190
The script will generate a model directory with the user-provided
140
191
name. Move that model directory to Triton's model repository.
141
192
Ensure the compiled model path provided to the script points to
142
-
a valid torchscript file.
193
+
a valid torchscript file or tensorflow savedmodel.
143
194
144
195
Now, the server can be launched with the model as below:
145
196
@@ -151,4 +202,4 @@ Note:
151
202
1. The `config.pbtxt` and `model.py` should be treated as
152
203
starting point. The users can customize these files as per
153
204
their need.
154
-
2. Triton Inferentia currently only works with **single** model.
205
+
2. Triton Inferentia is currently tested with a**single** model.
0 commit comments