Skip to content

Commit 0413e46

Browse files
authored
1 parent 34a4db5 commit 0413e46

File tree

6 files changed

+97
-69
lines changed

6 files changed

+97
-69
lines changed

examples/auto_complete/README.md

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
<!--
2-
# Copyright 2022, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
2+
# Copyright 2022-2024, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
33
#
44
# Redistribution and use in source and binary forms, with or without
55
# modification, are permitted provided that the following conditions
@@ -59,12 +59,12 @@ respectively.
5959
1. Create the model repository:
6060

6161
```console
62-
$ mkdir -p models/nobatch_auto_complete/1/
63-
$ mkdir -p models/batch_auto_complete/1/
62+
mkdir -p models/nobatch_auto_complete/1/
63+
mkdir -p models/batch_auto_complete/1/
6464

6565
# Copy the Python models
66-
$ cp examples/auto_complete/nobatch_model.py models/nobatch_auto_complete/1/model.py
67-
$ cp examples/auto_complete/batch_model.py models/batch_auto_complete/1/model.py
66+
cp examples/auto_complete/nobatch_model.py models/nobatch_auto_complete/1/model.py
67+
cp examples/auto_complete/batch_model.py models/batch_auto_complete/1/model.py
6868
```
6969
**Note that we don't need a model configuration file since Triton will use the
7070
auto-complete model configuration provided in the Python model.**

examples/bls/README.md

Lines changed: 19 additions & 19 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
<!--
2-
# Copyright 2021-2022, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
2+
# Copyright 2021-2024, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
33
#
44
# Redistribution and use in source and binary forms, with or without
55
# modification, are permitted provided that the following conditions
@@ -51,17 +51,17 @@ final outputs.
5151
1. Create the model repository:
5252

5353
```console
54-
$ mkdir -p models/add_sub/1
55-
$ mkdir -p models/bls_sync/1
56-
$ mkdir -p models/pytorch/1
54+
mkdir -p models/add_sub/1
55+
mkdir -p models/bls_sync/1
56+
mkdir -p models/pytorch/1
5757

5858
# Copy the Python models
59-
$ cp examples/add_sub/model.py models/add_sub/1/
60-
$ cp examples/add_sub/config.pbtxt models/add_sub/config.pbtxt
61-
$ cp examples/bls/sync_model.py models/bls_sync/1/model.py
62-
$ cp examples/bls/sync_config.pbtxt models/bls_sync/config.pbtxt
63-
$ cp examples/pytorch/model.py models/pytorch/1/
64-
$ cp examples/pytorch/config.pbtxt models/pytorch/
59+
cp examples/add_sub/model.py models/add_sub/1/
60+
cp examples/add_sub/config.pbtxt models/add_sub/config.pbtxt
61+
cp examples/bls/sync_model.py models/bls_sync/1/model.py
62+
cp examples/bls/sync_config.pbtxt models/bls_sync/config.pbtxt
63+
cp examples/pytorch/model.py models/pytorch/1/
64+
cp examples/pytorch/config.pbtxt models/pytorch/
6565
```
6666

6767
2. Start the tritonserver:
@@ -124,17 +124,17 @@ to construct the final inference response object using these tensors.
124124
1. Create the model repository:
125125

126126
```console
127-
$ mkdir -p models/add_sub/1
128-
$ mkdir -p models/bls_async/1
129-
$ mkdir -p models/pytorch/1
127+
mkdir -p models/add_sub/1
128+
mkdir -p models/bls_async/1
129+
mkdir -p models/pytorch/1
130130

131131
# Copy the Python models
132-
$ cp examples/add_sub/model.py models/add_sub/1/
133-
$ cp examples/add_sub/config.pbtxt models/add_sub/
134-
$ cp examples/bls/async_model.py models/bls_async/1/model.py
135-
$ cp examples/bls/async_config.pbtxt models/bls_async/config.pbtxt
136-
$ cp examples/pytorch/model.py models/pytorch/1/
137-
$ cp examples/pytorch/config.pbtxt models/pytorch/
132+
cp examples/add_sub/model.py models/add_sub/1/
133+
cp examples/add_sub/config.pbtxt models/add_sub/
134+
cp examples/bls/async_model.py models/bls_async/1/model.py
135+
cp examples/bls/async_config.pbtxt models/bls_async/config.pbtxt
136+
cp examples/pytorch/model.py models/pytorch/1/
137+
cp examples/pytorch/config.pbtxt models/pytorch/
138138
```
139139

140140
2. Start the tritonserver:

examples/decoupled/README.md

Lines changed: 7 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
<!--
2-
# Copyright 2022, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
2+
# Copyright 2022-2024, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
33
#
44
# Redistribution and use in source and binary forms, with or without
55
# modification, are permitted provided that the following conditions
@@ -47,14 +47,14 @@ real deployment, the model should not allow the caller thread to return from
4747
1. Create the model repository:
4848

4949
```console
50-
$ mkdir -p models/repeat_int32/1
51-
$ mkdir -p models/square_int32/1
50+
mkdir -p models/repeat_int32/1
51+
mkdir -p models/square_int32/1
5252

5353
# Copy the Python models
54-
$ cp examples/decoupled/repeat_model.py models/repeat_int32/1/model.py
55-
$ cp examples/decoupled/repeat_config.pbtxt models/repeat_int32/config.pbtxt
56-
$ cp examples/decoupled/square_model.py models/square_int32/1/model.py
57-
$ cp examples/decoupled/square_config.pbtxt models/square_int32/config.pbtxt
54+
cp examples/decoupled/repeat_model.py models/repeat_int32/1/model.py
55+
cp examples/decoupled/repeat_config.pbtxt models/repeat_int32/config.pbtxt
56+
cp examples/decoupled/square_model.py models/square_int32/1/model.py
57+
cp examples/decoupled/square_config.pbtxt models/square_int32/config.pbtxt
5858
```
5959

6060
2. Start the tritonserver:

examples/jax/README.md

Lines changed: 12 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
<!--
2-
# Copyright 2022-2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
2+
# Copyright 2022-2024, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
33
#
44
# Redistribution and use in source and binary forms, with or without
55
# modification, are permitted provided that the following conditions
@@ -42,9 +42,9 @@ First, download the [client.py](client.py), [config.pbtxt](config.pbtxt) and
4242
Next, at the directory where the three files located, create the model
4343
repository with the following commands:
4444
```
45-
$ mkdir -p models/jax/1
46-
$ mv model.py models/jax/1
47-
$ mv config.pbtxt models/jax
45+
mkdir -p models/jax/1
46+
mv model.py models/jax/1
47+
mv config.pbtxt models/jax
4848
```
4949

5050
## Pull the Triton Docker images
@@ -55,16 +55,16 @@ to the
5555

5656
To pull the latest containers, run the following commands:
5757
```
58-
$ docker pull nvcr.io/nvidia/tritonserver:<yy.mm>-py3
59-
$ docker pull nvcr.io/nvidia/tritonserver:<yy.mm>-py3-sdk
58+
docker pull nvcr.io/nvidia/tritonserver:<yy.mm>-py3
59+
docker pull nvcr.io/nvidia/tritonserver:<yy.mm>-py3-sdk
6060
```
6161
See the installation steps above for the `<yy.mm>` version.
6262

6363
At the time of writing, the latest version is `23.04`, which translates to the
6464
following commands:
6565
```
66-
$ docker pull nvcr.io/nvidia/tritonserver:23.04-py3
67-
$ docker pull nvcr.io/nvidia/tritonserver:23.04-py3-sdk
66+
docker pull nvcr.io/nvidia/tritonserver:23.04-py3
67+
docker pull nvcr.io/nvidia/tritonserver:23.04-py3-sdk
6868
```
6969

7070
Be sure to replace the `<yy.mm>` with the version pulled for all the remaining
@@ -75,7 +75,7 @@ parts of this example.
7575
At the directory where we created the JAX models (at where the "models" folder
7676
is located), run the following command:
7777
```
78-
$ docker run --gpus all -it --rm -p 8000:8000 -v `pwd`:/jax nvcr.io/nvidia/tritonserver:<yy.mm>-py3 /bin/bash
78+
docker run --gpus all -it --rm -p 8000:8000 -v `pwd`:/jax nvcr.io/nvidia/tritonserver:<yy.mm>-py3 /bin/bash
7979
```
8080

8181
Inside the container, we need to install JAX to run this example.
@@ -87,12 +87,12 @@ dependencies.
8787

8888
To install for this example, run the following command:
8989
```
90-
$ pip3 install --upgrade "jax[cuda12_local]" -f https://storage.googleapis.com/jax-releases/jax_cuda_releases.html
90+
pip3 install --upgrade "jax[cuda12_local]" -f https://storage.googleapis.com/jax-releases/jax_cuda_releases.html
9191
```
9292

9393
Finally, we need to start the Triton Server, run the following command:
9494
```
95-
$ tritonserver --model-repository=/jax/models
95+
tritonserver --model-repository=/jax/models
9696
```
9797

9898
To leave the container for the next step, press: `CTRL + P + Q`.
@@ -101,7 +101,7 @@ To leave the container for the next step, press: `CTRL + P + Q`.
101101

102102
At the directory where the client.py is located, run the following command:
103103
```
104-
$ docker run --rm --net=host -v `pwd`:/jax nvcr.io/nvidia/tritonserver:<yy.mm>-py3-sdk python3 /jax/client.py
104+
docker run --rm --net=host -v `pwd`:/jax nvcr.io/nvidia/tritonserver:<yy.mm>-py3-sdk python3 /jax/client.py
105105
```
106106

107107
A successful inference will print the following at the end:

examples/preprocessing/README.md

Lines changed: 42 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -1,43 +1,71 @@
1+
<!--
2+
# Copyright 2021-2024, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
3+
#
4+
# Redistribution and use in source and binary forms, with or without
5+
# modification, are permitted provided that the following conditions
6+
# are met:
7+
# * Redistributions of source code must retain the above copyright
8+
# notice, this list of conditions and the following disclaimer.
9+
# * Redistributions in binary form must reproduce the above copyright
10+
# notice, this list of conditions and the following disclaimer in the
11+
# documentation and/or other materials provided with the distribution.
12+
# * Neither the name of NVIDIA CORPORATION nor the names of its
13+
# contributors may be used to endorse or promote products derived
14+
# from this software without specific prior written permission.
15+
#
16+
# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS ``AS IS'' AND ANY
17+
# EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
18+
# IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
19+
# PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR
20+
# CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
21+
# EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
22+
# PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
23+
# PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY
24+
# OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
25+
# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
26+
# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
27+
-->
28+
129
# **Preprocessing Using Python Backend Example**
230
This example shows how to preprocess your inputs using Python backend before it is passed to the TensorRT model for inference. This ensemble model includes an image preprocessing model (preprocess) and a TensorRT model (resnet50_trt) to do inference.
331

432
**1. Converting PyTorch Model to ONNX format:**
533

634
Run onnx_exporter.py to convert ResNet50 PyTorch model to ONNX format. Width and height dims are fixed at 224 but dynamic axes arguments for dynamic batching are used. Commands from the 2. and 3. subsections shall be executed within this Docker container.
735

8-
$ docker run -it --gpus=all -v $(pwd):/workspace nvcr.io/nvidia/pytorch:xx.yy-py3 bash
9-
$ pip install numpy pillow torchvision
10-
$ python onnx_exporter.py --save model.onnx
36+
docker run -it --gpus=all -v $(pwd):/workspace nvcr.io/nvidia/pytorch:xx.yy-py3 bash
37+
pip install numpy pillow torchvision
38+
python onnx_exporter.py --save model.onnx
1139

1240
**2. Create the model repository:**
1341

14-
$ mkdir -p model_repository/ensemble_python_resnet50/1
15-
$ mkdir -p model_repository/preprocess/1
16-
$ mkdir -p model_repository/resnet50_trt/1
42+
mkdir -p model_repository/ensemble_python_resnet50/1
43+
mkdir -p model_repository/preprocess/1
44+
mkdir -p model_repository/resnet50_trt/1
1745

1846
# Copy the Python model
19-
$ cp model.py model_repository/preprocess/1
47+
cp model.py model_repository/preprocess/1
2048

2149
**3. Build a TensorRT engine for the ONNX model**
2250

2351
Set the arguments for enabling fp16 precision --fp16. To enable dynamic shapes use --minShapes, --optShapes, and maxShapes with --explicitBatch:
2452

25-
$ trtexec --onnx=model.onnx --saveEngine=./model_repository/resnet50_trt/1/model.plan --explicitBatch --minShapes=input:1x3x224x224 --optShapes=input:1x3x224x224 --maxShapes=input:256x3x224x224 --fp16
53+
trtexec --onnx=model.onnx --saveEngine=./model_repository/resnet50_trt/1/model.plan --explicitBatch --minShapes=input:1x3x224x224 --optShapes=input:1x3x224x224 --maxShapes=input:256x3x224x224 --fp16
2654

2755
**4. Run the command below to start the server container:**
2856

2957
Under python_backend/examples/preprocessing, run this command to start the server docker container:
3058

31-
$ docker run --gpus=all -it --rm -p8000:8000 -p8001:8001 -p8002:8002 -v$(pwd):/workspace/ -v/$(pwd)/model_repository:/models nvcr.io/nvidia/tritonserver:xx.yy-py3 bash
32-
$ pip install numpy pillow torchvision
33-
$ tritonserver --model-repository=/models
59+
docker run --gpus=all -it --rm -p8000:8000 -p8001:8001 -p8002:8002 -v$(pwd):/workspace/ -v/$(pwd)/model_repository:/models nvcr.io/nvidia/tritonserver:xx.yy-py3 bash
60+
pip install numpy pillow torchvision
61+
tritonserver --model-repository=/models
3462

3563
**5. Start the client to test:**
3664

3765
Under python_backend/examples/preprocessing, run the commands below to start the client Docker container:
3866

39-
$ wget https://raw.githubusercontent.com/triton-inference-server/server/main/qa/images/mug.jpg -O "mug.jpg"
40-
$ docker run --rm --net=host -v $(pwd):/workspace/ nvcr.io/nvidia/tritonserver:xx.yy-py3-sdk python client.py --image mug.jpg
41-
$ The result of classification is:COFFEE MUG
67+
wget https://raw.githubusercontent.com/triton-inference-server/server/main/qa/images/mug.jpg -O "mug.jpg"
68+
docker run --rm --net=host -v $(pwd):/workspace/ nvcr.io/nvidia/tritonserver:xx.yy-py3-sdk python client.py --image mug.jpg
69+
The result of classification is:COFFEE MUG
4270

4371
Here, since we input an image of "mug" and the inference result is "COFFEE MUG" which is correct.

inferentia/README.md

Lines changed: 12 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -60,18 +60,18 @@ or simply clone with https.
6060
Clone this repo with Github to home repo `/home/ubuntu`.
6161

6262
```
63-
$chmod 777 /home/ubuntu/python_backend/inferentia/scripts/setup-pre-container.sh
64-
$sudo /home/ubuntu/python_backend/inferentia/scripts/setup-pre-container.sh
63+
chmod 777 /home/ubuntu/python_backend/inferentia/scripts/setup-pre-container.sh
64+
sudo /home/ubuntu/python_backend/inferentia/scripts/setup-pre-container.sh
6565
```
6666

6767
Then, start the Triton instance with:
6868
```
69-
$docker run --device /dev/neuron0 <more neuron devices> -v /home/ubuntu/python_backend:/home/ubuntu/python_backend -v /lib/udev:/mylib/udev --shm-size=1g --ulimit memlock=-1 -p 8000:8000 -p 8001:8001 -p 8002:8002 --ulimit stack=67108864 -ti nvcr.io/nvidia/tritonserver:<xx.yy>-py3
69+
docker run --device /dev/neuron0 <more neuron devices> -v /home/ubuntu/python_backend:/home/ubuntu/python_backend -v /lib/udev:/mylib/udev --shm-size=1g --ulimit memlock=-1 -p 8000:8000 -p 8001:8001 -p 8002:8002 --ulimit stack=67108864 -ti nvcr.io/nvidia/tritonserver:<xx.yy>-py3
7070
```
7171
Note 1: The user would need to list any neuron device to run during container initialization.
7272
For example, to use 4 neuron devices on an instance, the user would need to run with:
7373
```
74-
$docker run --device /dev/neuron0 --device /dev/neuron1 --device /dev/neuron2 --device /dev/neuron3 ...`
74+
docker run --device /dev/neuron0 --device /dev/neuron1 --device /dev/neuron2 --device /dev/neuron3 ...`
7575
```
7676
Note 2: `/mylib/udev` is used for Neuron parameter passing.
7777

@@ -81,7 +81,7 @@ Note 3: For Triton container version xx.yy, please refer to
8181

8282
After starting the Triton container, go into the `python_backend` folder and run the setup script.
8383
```
84-
$source /home/ubuntu/python_backend/inferentia/scripts/setup.sh
84+
source /home/ubuntu/python_backend/inferentia/scripts/setup.sh
8585
```
8686
This script will:
8787
1. Install necessary dependencies
@@ -118,7 +118,7 @@ triton python model directory.
118118
An example invocation for the `gen_triton_model.py` for PyTorch model can look like:
119119

120120
```
121-
$python3 inferentia/scripts/gen_triton_model.py --model_type pytorch --triton_input INPUT__0,INT64,4x384 INPUT__1,INT64,4x384 INPUT__2,INT64,4x384 --triton_output OUTPUT__0,INT64,4x384 OUTPUT__1,INT64,4x384 --compiled_model /home/ubuntu/bert_large_mlperf_neuron_hack_bs1_dynamic.pt --neuron_core_range 0:3 --triton_model_dir bert-large-mlperf-bs1x4
121+
python3 inferentia/scripts/gen_triton_model.py --model_type pytorch --triton_input INPUT__0,INT64,4x384 INPUT__1,INT64,4x384 INPUT__2,INT64,4x384 --triton_output OUTPUT__0,INT64,4x384 OUTPUT__1,INT64,4x384 --compiled_model /home/ubuntu/bert_large_mlperf_neuron_hack_bs1_dynamic.pt --neuron_core_range 0:3 --triton_model_dir bert-large-mlperf-bs1x4
122122
```
123123

124124
In order for the script to treat the compiled model as TorchScript
@@ -161,7 +161,7 @@ script to generate triton python model directory.
161161
An example invocation for the `gen_triton_model.py` for TensorFlow model can look like:
162162

163163
```
164-
$python3 gen_triton_model.py --model_type tensorflow --compiled_model /home/ubuntu/inferentia-poc-2.0/scripts-rn50-tf-native/resnet50_mlperf_opt_fp16_compiled_b5_nc1/1 --neuron_core_range 0:3 --triton_model_dir rn50-1neuroncores-bs1x1
164+
python3 gen_triton_model.py --model_type tensorflow --compiled_model /home/ubuntu/inferentia-poc-2.0/scripts-rn50-tf-native/resnet50_mlperf_opt_fp16_compiled_b5_nc1/1 --neuron_core_range 0:3 --triton_model_dir rn50-1neuroncores-bs1x1
165165
```
166166

167167
NOTE: Unlike TorchScript model, TensorFlow SavedModel stores sufficient
@@ -215,7 +215,7 @@ a valid torchscript file or tensorflow savedmodel.
215215
Now, the server can be launched with the model as below:
216216

217217
```
218-
$tritonserver --model-repository <path_to_model_repository>
218+
tritonserver --model-repository <path_to_model_repository>
219219
```
220220

221221
Note:
@@ -255,7 +255,7 @@ contains the necessary files to set up testing with a simple add_sub model. The
255255
requires an instance with more than 8 inferentia cores to run, eg:`inf1.6xlarge`.
256256
start the test, run
257257
```
258-
$source <triton path>/python_backend/inferentia/qa/setup_test_enviroment_and_test.sh
258+
source <triton path>/python_backend/inferentia/qa/setup_test_enviroment_and_test.sh
259259
```
260260
where `<triton path>` is usually `/home/ubuntu`/.
261261
This script will pull the [server repo](https://github.com/triton-inference-server/server)
@@ -265,16 +265,16 @@ Triton Server and Triton SDK.
265265
Note: If you would need to change some of the tests in the server repo,
266266
you would need to run
267267
```
268-
$export TRITON_SERVER_REPO_TAG=<your branch name>
268+
export TRITON_SERVER_REPO_TAG=<your branch name>
269269
```
270270
before running the script.
271271

272272
# Using Triton with Inferentia 2, or Trn1
273273
## pytorch-neuronx and tensorflow-neuronx
274274
1. Similar to the steps for inf1, change the argument to the pre-container and on-container setup scripts to include the `-inf2` or `-trn1`flags e.g.,
275275
```
276-
$chmod 777 /home/ubuntu/python_backend/inferentia/scripts/setup-pre-container.sh
277-
$sudo /home/ubuntu/python_backend/inferentia/scripts/setup-pre-container.sh -inf2
276+
chmod 777 /home/ubuntu/python_backend/inferentia/scripts/setup-pre-container.sh
277+
sudo /home/ubuntu/python_backend/inferentia/scripts/setup-pre-container.sh -inf2
278278
```
279279
2. On the container, followed by the `docker run` command, you can pass similar argument to the setup.sh script
280280
For Pytorch:

0 commit comments

Comments
 (0)