simonzgx
diff --git a/‎examples/auto_complete/README.md‎
Lines changed: 5 additions & 5 deletions b/‎examples/auto_complete/README.md‎
Lines changed: 5 additions & 5 deletions
diff --git a/‎examples/bls/README.md‎
Lines changed: 19 additions & 19 deletions b/‎examples/bls/README.md‎
Lines changed: 19 additions & 19 deletions
diff --git a/‎examples/decoupled/README.md‎
Lines changed: 7 additions & 7 deletions b/‎examples/decoupled/README.md‎
Lines changed: 7 additions & 7 deletions
diff --git a/‎examples/jax/README.md‎
Lines changed: 12 additions & 12 deletions b/‎examples/jax/README.md‎
Lines changed: 12 additions & 12 deletions
diff --git a/‎examples/preprocessing/README.md‎
Lines changed: 42 additions & 14 deletions b/‎examples/preprocessing/README.md‎
Lines changed: 42 additions & 14 deletions
diff --git a/‎inferentia/README.md‎
Lines changed: 12 additions & 12 deletions b/‎inferentia/README.md‎
Lines changed: 12 additions & 12 deletions
@@ -1,5 +1,5 @@
 <!--
-# Copyright 2022, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+# Copyright 2022-2024, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
 #
 # Redistribution and use in source and binary forms, with or without
 # modification, are permitted provided that the following conditions
@@ -59,12 +59,12 @@ respectively.
 1. Create the model repository:
 
 ```console
-$ mkdir -p models/nobatch_auto_complete/1/
-$ mkdir -p models/batch_auto_complete/1/
+mkdir -p models/nobatch_auto_complete/1/
+mkdir -p models/batch_auto_complete/1/
 
 # Copy the Python models
-$ cp examples/auto_complete/nobatch_model.py models/nobatch_auto_complete/1/model.py
-$ cp examples/auto_complete/batch_model.py models/batch_auto_complete/1/model.py
+cp examples/auto_complete/nobatch_model.py models/nobatch_auto_complete/1/model.py
+cp examples/auto_complete/batch_model.py models/batch_auto_complete/1/model.py
 ```
 **Note that we don't need a model configuration file since Triton will use the
 auto-complete model configuration provided in the Python model.**
 
@@ -1,5 +1,5 @@
 <!--
-# Copyright 2021-2022, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+# Copyright 2021-2024, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
 #
 # Redistribution and use in source and binary forms, with or without
 # modification, are permitted provided that the following conditions
@@ -51,17 +51,17 @@ final outputs.
 1. Create the model repository:
 
 ```console
-$ mkdir -p models/add_sub/1
-$ mkdir -p models/bls_sync/1
-$ mkdir -p models/pytorch/1
+mkdir -p models/add_sub/1
+mkdir -p models/bls_sync/1
+mkdir -p models/pytorch/1
 
 # Copy the Python models
-$ cp examples/add_sub/model.py models/add_sub/1/
-$ cp examples/add_sub/config.pbtxt models/add_sub/config.pbtxt
-$ cp examples/bls/sync_model.py models/bls_sync/1/model.py
-$ cp examples/bls/sync_config.pbtxt models/bls_sync/config.pbtxt
-$ cp examples/pytorch/model.py models/pytorch/1/
-$ cp examples/pytorch/config.pbtxt models/pytorch/
+cp examples/add_sub/model.py models/add_sub/1/
+cp examples/add_sub/config.pbtxt models/add_sub/config.pbtxt
+cp examples/bls/sync_model.py models/bls_sync/1/model.py
+cp examples/bls/sync_config.pbtxt models/bls_sync/config.pbtxt
+cp examples/pytorch/model.py models/pytorch/1/
+cp examples/pytorch/config.pbtxt models/pytorch/
 ```
 
 2. Start the tritonserver:
@@ -124,17 +124,17 @@ to construct the final inference response object using these tensors.
 1. Create the model repository:
 
 ```console
-$ mkdir -p models/add_sub/1
-$ mkdir -p models/bls_async/1
-$ mkdir -p models/pytorch/1
+mkdir -p models/add_sub/1
+mkdir -p models/bls_async/1
+mkdir -p models/pytorch/1
 
 # Copy the Python models
-$ cp examples/add_sub/model.py models/add_sub/1/
-$ cp examples/add_sub/config.pbtxt models/add_sub/
-$ cp examples/bls/async_model.py models/bls_async/1/model.py
-$ cp examples/bls/async_config.pbtxt models/bls_async/config.pbtxt
-$ cp examples/pytorch/model.py models/pytorch/1/
-$ cp examples/pytorch/config.pbtxt models/pytorch/
+cp examples/add_sub/model.py models/add_sub/1/
+cp examples/add_sub/config.pbtxt models/add_sub/
+cp examples/bls/async_model.py models/bls_async/1/model.py
+cp examples/bls/async_config.pbtxt models/bls_async/config.pbtxt
+cp examples/pytorch/model.py models/pytorch/1/
+cp examples/pytorch/config.pbtxt models/pytorch/
 ```
 
 2. Start the tritonserver:
 
@@ -1,5 +1,5 @@
 <!--
-# Copyright 2022, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+# Copyright 2022-2024, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
 #
 # Redistribution and use in source and binary forms, with or without
 # modification, are permitted provided that the following conditions
@@ -47,14 +47,14 @@ real deployment, the model should not allow the caller thread to return from
 1. Create the model repository:
 
 ```console
-$ mkdir -p models/repeat_int32/1
-$ mkdir -p models/square_int32/1
+mkdir -p models/repeat_int32/1
+mkdir -p models/square_int32/1
 
 # Copy the Python models
-$ cp examples/decoupled/repeat_model.py models/repeat_int32/1/model.py
-$ cp examples/decoupled/repeat_config.pbtxt models/repeat_int32/config.pbtxt
-$ cp examples/decoupled/square_model.py models/square_int32/1/model.py
-$ cp examples/decoupled/square_config.pbtxt models/square_int32/config.pbtxt
+cp examples/decoupled/repeat_model.py models/repeat_int32/1/model.py
+cp examples/decoupled/repeat_config.pbtxt models/repeat_int32/config.pbtxt
+cp examples/decoupled/square_model.py models/square_int32/1/model.py
+cp examples/decoupled/square_config.pbtxt models/square_int32/config.pbtxt
 ```
 
 2. Start the tritonserver:
 
@@ -1,5 +1,5 @@
 <!--
-# Copyright 2022-2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+# Copyright 2022-2024, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
 #
 # Redistribution and use in source and binary forms, with or without
 # modification, are permitted provided that the following conditions
@@ -42,9 +42,9 @@ First, download the [client.py](client.py), [config.pbtxt](config.pbtxt) and
 Next, at the directory where the three files located, create the model
 repository with the following commands:
 ```
-$ mkdir -p models/jax/1
-$ mv model.py models/jax/1
-$ mv config.pbtxt models/jax
+mkdir -p models/jax/1
+mv model.py models/jax/1
+mv config.pbtxt models/jax
 ```
 
 ## Pull the Triton Docker images
@@ -55,16 +55,16 @@ to the
 
 To pull the latest containers, run the following commands:
 ```
-$ docker pull nvcr.io/nvidia/tritonserver:<yy.mm>-py3
-$ docker pull nvcr.io/nvidia/tritonserver:<yy.mm>-py3-sdk
+docker pull nvcr.io/nvidia/tritonserver:<yy.mm>-py3
+docker pull nvcr.io/nvidia/tritonserver:<yy.mm>-py3-sdk
 ```
 See the installation steps above for the `<yy.mm>` version.
 
 At the time of writing, the latest version is `23.04`, which translates to the
 following commands:
 ```
-$ docker pull nvcr.io/nvidia/tritonserver:23.04-py3
-$ docker pull nvcr.io/nvidia/tritonserver:23.04-py3-sdk
+docker pull nvcr.io/nvidia/tritonserver:23.04-py3
+docker pull nvcr.io/nvidia/tritonserver:23.04-py3-sdk
 ```
 
 Be sure to replace the `<yy.mm>` with the version pulled for all the remaining
@@ -75,7 +75,7 @@ parts of this example.
 At the directory where we created the JAX models (at where the "models" folder
 is located), run the following command:
 ```
-$ docker run --gpus all -it --rm -p 8000:8000 -v `pwd`:/jax nvcr.io/nvidia/tritonserver:<yy.mm>-py3 /bin/bash
+docker run --gpus all -it --rm -p 8000:8000 -v `pwd`:/jax nvcr.io/nvidia/tritonserver:<yy.mm>-py3 /bin/bash
 ```
 
 Inside the container, we need to install JAX to run this example.
@@ -87,12 +87,12 @@ dependencies.
 
 To install for this example, run the following command:
 ```
-$ pip3 install --upgrade "jax[cuda12_local]" -f https://storage.googleapis.com/jax-releases/jax_cuda_releases.html
+pip3 install --upgrade "jax[cuda12_local]" -f https://storage.googleapis.com/jax-releases/jax_cuda_releases.html
 ```
 
 Finally, we need to start the Triton Server, run the following command:
 ```
-$ tritonserver --model-repository=/jax/models
+tritonserver --model-repository=/jax/models
 ```
 
 To leave the container for the next step, press: `CTRL + P + Q`.
@@ -101,7 +101,7 @@ To leave the container for the next step, press: `CTRL + P + Q`.
 
 At the directory where the client.py is located, run the following command:
 ```
-$ docker run --rm --net=host -v `pwd`:/jax nvcr.io/nvidia/tritonserver:<yy.mm>-py3-sdk python3 /jax/client.py
+docker run --rm --net=host -v `pwd`:/jax nvcr.io/nvidia/tritonserver:<yy.mm>-py3-sdk python3 /jax/client.py
 ```
 
 A successful inference will print the following at the end:
 
@@ -1,43 +1,71 @@
+<!--
+# Copyright 2021-2024, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+#
+# Redistribution and use in source and binary forms, with or without
+# modification, are permitted provided that the following conditions
+# are met:
+#  * Redistributions of source code must retain the above copyright
+#    notice, this list of conditions and the following disclaimer.
+#  * Redistributions in binary form must reproduce the above copyright
+#    notice, this list of conditions and the following disclaimer in the
+#    documentation and/or other materials provided with the distribution.
+#  * Neither the name of NVIDIA CORPORATION nor the names of its
+#    contributors may be used to endorse or promote products derived
+#    from this software without specific prior written permission.
+#
+# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS ``AS IS'' AND ANY
+# EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+# IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
+# PURPOSE ARE DISCLAIMED.  IN NO EVENT SHALL THE COPYRIGHT OWNER OR
+# CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
+# EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
+# PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
+# PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY
+# OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+-->
+
 # **Preprocessing Using Python Backend Example**
 This example shows how to preprocess your inputs using Python backend before it is passed to the TensorRT model for inference. This ensemble model includes an image preprocessing model (preprocess) and a TensorRT model (resnet50_trt) to do inference.
 
 **1. Converting PyTorch Model to ONNX format:**
 
 Run onnx_exporter.py to convert ResNet50 PyTorch model to ONNX format. Width and height dims are fixed at 224 but dynamic axes arguments for dynamic batching are used. Commands from the 2. and 3. subsections shall be executed within this Docker container.
 
-    $ docker run -it --gpus=all -v $(pwd):/workspace nvcr.io/nvidia/pytorch:xx.yy-py3 bash
-    $ pip install numpy pillow torchvision
-    $ python onnx_exporter.py --save model.onnx
+    docker run -it --gpus=all -v $(pwd):/workspace nvcr.io/nvidia/pytorch:xx.yy-py3 bash
+    pip install numpy pillow torchvision
+    python onnx_exporter.py --save model.onnx
 
 **2. Create the model repository:**
 
-    $ mkdir -p model_repository/ensemble_python_resnet50/1
-    $ mkdir -p model_repository/preprocess/1
-    $ mkdir -p model_repository/resnet50_trt/1
+    mkdir -p model_repository/ensemble_python_resnet50/1
+    mkdir -p model_repository/preprocess/1
+    mkdir -p model_repository/resnet50_trt/1
 
     # Copy the Python model
-    $ cp model.py model_repository/preprocess/1
+    cp model.py model_repository/preprocess/1
 
 **3. Build a TensorRT engine for the ONNX model**
 
 Set the arguments for enabling fp16 precision --fp16. To enable dynamic shapes use --minShapes, --optShapes, and maxShapes with --explicitBatch:
 
-    $ trtexec --onnx=model.onnx --saveEngine=./model_repository/resnet50_trt/1/model.plan --explicitBatch --minShapes=input:1x3x224x224 --optShapes=input:1x3x224x224 --maxShapes=input:256x3x224x224 --fp16
+    trtexec --onnx=model.onnx --saveEngine=./model_repository/resnet50_trt/1/model.plan --explicitBatch --minShapes=input:1x3x224x224 --optShapes=input:1x3x224x224 --maxShapes=input:256x3x224x224 --fp16
 
 **4. Run the command below to start the server container:**
 
 Under python_backend/examples/preprocessing, run this command to start the server docker container:
 
-    $ docker run --gpus=all -it --rm -p8000:8000 -p8001:8001 -p8002:8002 -v$(pwd):/workspace/ -v/$(pwd)/model_repository:/models nvcr.io/nvidia/tritonserver:xx.yy-py3 bash
-    $ pip install numpy pillow torchvision
-    $ tritonserver --model-repository=/models
+    docker run --gpus=all -it --rm -p8000:8000 -p8001:8001 -p8002:8002 -v$(pwd):/workspace/ -v/$(pwd)/model_repository:/models nvcr.io/nvidia/tritonserver:xx.yy-py3 bash
+    pip install numpy pillow torchvision
+    tritonserver --model-repository=/models
 
 **5. Start the client to test:**
 
 Under python_backend/examples/preprocessing, run the commands below to start the client Docker container:
 
-    $ wget https://raw.githubusercontent.com/triton-inference-server/server/main/qa/images/mug.jpg -O "mug.jpg"
-    $ docker run --rm --net=host -v $(pwd):/workspace/ nvcr.io/nvidia/tritonserver:xx.yy-py3-sdk python client.py --image mug.jpg
-    $ The result of classification is:COFFEE MUG
+    wget https://raw.githubusercontent.com/triton-inference-server/server/main/qa/images/mug.jpg -O "mug.jpg"
+    docker run --rm --net=host -v $(pwd):/workspace/ nvcr.io/nvidia/tritonserver:xx.yy-py3-sdk python client.py --image mug.jpg
+    The result of classification is:COFFEE MUG
 
 Here, since we input an image of "mug" and the inference result is "COFFEE MUG" which is correct.
@@ -60,18 +60,18 @@ or simply clone with https.
 Clone this repo with Github to home repo `/home/ubuntu`.
 
 ```
- $chmod 777 /home/ubuntu/python_backend/inferentia/scripts/setup-pre-container.sh
- $sudo /home/ubuntu/python_backend/inferentia/scripts/setup-pre-container.sh
+ chmod 777 /home/ubuntu/python_backend/inferentia/scripts/setup-pre-container.sh
+ sudo /home/ubuntu/python_backend/inferentia/scripts/setup-pre-container.sh
 ```
 
 Then, start the Triton instance with:
 ```
- $docker run --device /dev/neuron0 <more neuron devices> -v /home/ubuntu/python_backend:/home/ubuntu/python_backend -v /lib/udev:/mylib/udev --shm-size=1g --ulimit memlock=-1 -p 8000:8000 -p 8001:8001 -p 8002:8002 --ulimit stack=67108864 -ti nvcr.io/nvidia/tritonserver:<xx.yy>-py3
+ docker run --device /dev/neuron0 <more neuron devices> -v /home/ubuntu/python_backend:/home/ubuntu/python_backend -v /lib/udev:/mylib/udev --shm-size=1g --ulimit memlock=-1 -p 8000:8000 -p 8001:8001 -p 8002:8002 --ulimit stack=67108864 -ti nvcr.io/nvidia/tritonserver:<xx.yy>-py3
 ```
 Note 1: The user would need to list any neuron device to run during container initialization.
 For example, to use 4 neuron devices on an instance, the user would need to run with:
 ```
- $docker run --device /dev/neuron0 --device /dev/neuron1 --device /dev/neuron2 --device /dev/neuron3 ...`
+ docker run --device /dev/neuron0 --device /dev/neuron1 --device /dev/neuron2 --device /dev/neuron3 ...`
 ```
 Note 2: `/mylib/udev` is used for Neuron parameter passing.
 
@@ -81,7 +81,7 @@ Note 3: For Triton container version xx.yy, please refer to
 
 After starting the Triton container, go into the `python_backend` folder and run the setup script.
 ```
- $source /home/ubuntu/python_backend/inferentia/scripts/setup.sh
+ source /home/ubuntu/python_backend/inferentia/scripts/setup.sh
 ```
 This script will:
 1. Install necessary dependencies
@@ -118,7 +118,7 @@ triton python model directory.
 An example invocation for the `gen_triton_model.py` for PyTorch model can look like:
 
 ```
- $python3 inferentia/scripts/gen_triton_model.py --model_type pytorch --triton_input INPUT__0,INT64,4x384 INPUT__1,INT64,4x384 INPUT__2,INT64,4x384 --triton_output OUTPUT__0,INT64,4x384 OUTPUT__1,INT64,4x384 --compiled_model /home/ubuntu/bert_large_mlperf_neuron_hack_bs1_dynamic.pt --neuron_core_range 0:3 --triton_model_dir bert-large-mlperf-bs1x4
+ python3 inferentia/scripts/gen_triton_model.py --model_type pytorch --triton_input INPUT__0,INT64,4x384 INPUT__1,INT64,4x384 INPUT__2,INT64,4x384 --triton_output OUTPUT__0,INT64,4x384 OUTPUT__1,INT64,4x384 --compiled_model /home/ubuntu/bert_large_mlperf_neuron_hack_bs1_dynamic.pt --neuron_core_range 0:3 --triton_model_dir bert-large-mlperf-bs1x4
 ```
 
 In order for the script to treat the compiled model as TorchScript
@@ -161,7 +161,7 @@ script to generate triton python model directory.
 An example invocation for the `gen_triton_model.py` for TensorFlow model can look like:
 
 ```
- $python3 gen_triton_model.py --model_type tensorflow --compiled_model /home/ubuntu/inferentia-poc-2.0/scripts-rn50-tf-native/resnet50_mlperf_opt_fp16_compiled_b5_nc1/1 --neuron_core_range 0:3  --triton_model_dir rn50-1neuroncores-bs1x1
+ python3 gen_triton_model.py --model_type tensorflow --compiled_model /home/ubuntu/inferentia-poc-2.0/scripts-rn50-tf-native/resnet50_mlperf_opt_fp16_compiled_b5_nc1/1 --neuron_core_range 0:3  --triton_model_dir rn50-1neuroncores-bs1x1
 ```
 
 NOTE: Unlike TorchScript model, TensorFlow SavedModel stores sufficient
@@ -215,7 +215,7 @@ a valid torchscript file or tensorflow savedmodel.
 Now, the server can be launched with the model as below:
 
 ```
- $tritonserver --model-repository <path_to_model_repository>
+ tritonserver --model-repository <path_to_model_repository>
 ```
 
 Note:
@@ -255,7 +255,7 @@ contains the necessary files to set up testing with a simple add_sub model. The
 requires an instance with more than 8 inferentia cores to run, eg:`inf1.6xlarge`.
 start the test, run
 ```
- $source <triton path>/python_backend/inferentia/qa/setup_test_enviroment_and_test.sh
+ source <triton path>/python_backend/inferentia/qa/setup_test_enviroment_and_test.sh
 ```
 where `<triton path>` is usually `/home/ubuntu`/.
 This script will pull the [server repo](https://github.com/triton-inference-server/server)
@@ -265,16 +265,16 @@ Triton Server and Triton SDK.
 Note: If you would need to change some of the tests in the server repo,
 you would need to run
 ```
- $export TRITON_SERVER_REPO_TAG=<your branch name>
+ export TRITON_SERVER_REPO_TAG=<your branch name>
 ```
 before running the script.
 
 # Using Triton with Inferentia 2, or Trn1
 ## pytorch-neuronx and tensorflow-neuronx
 1. Similar to the steps for inf1, change the argument to the pre-container and on-container setup scripts to include the `-inf2` or `-trn1`flags e.g.,
 ```
- $chmod 777 /home/ubuntu/python_backend/inferentia/scripts/setup-pre-container.sh
- $sudo /home/ubuntu/python_backend/inferentia/scripts/setup-pre-container.sh -inf2
+ chmod 777 /home/ubuntu/python_backend/inferentia/scripts/setup-pre-container.sh
+ sudo /home/ubuntu/python_backend/inferentia/scripts/setup-pre-container.sh -inf2
 ```
 2. On the container, followed by the `docker run` command, you can pass similar argument to the setup.sh script
 For Pytorch: