yanyusong
diff --git a/‎inferentia/README.md‎
Lines changed: 49 additions & 7 deletions b/‎inferentia/README.md‎
Lines changed: 49 additions & 7 deletions
diff --git a/‎inferentia/qa/Dockerfile.QA‎
Lines changed: 82 additions & 0 deletions b/‎inferentia/qa/Dockerfile.QA‎
Lines changed: 82 additions & 0 deletions
diff --git a/‎inferentia/qa/setup_test_enviroment_and_test.sh‎
Lines changed: 116 additions & 0 deletions b/‎inferentia/qa/setup_test_enviroment_and_test.sh‎
Lines changed: 116 additions & 0 deletions
diff --git a/‎inferentia/scripts/setup-pre-container.sh‎
Lines changed: 4 additions & 4 deletions b/‎inferentia/scripts/setup-pre-container.sh‎
Lines changed: 4 additions & 4 deletions
diff --git a/‎inferentia/scripts/setup.sh‎
Lines changed: 1 addition & 1 deletion b/‎inferentia/scripts/setup.sh‎
Lines changed: 1 addition & 1 deletion
@@ -32,6 +32,17 @@ Starting from 21.11 release, Triton supports
 [AWS Inferentia](https://aws.amazon.com/machine-learning/inferentia/) 
 and the [Neuron Runtime](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/neuron-intro/get-started.html).
 
+## Table of Contents
+
+- [Using Triton with Inferentia](#using-triton-with-inferentia)
+  - [Table of Contents](#table-of-contents)
+  - [Inferentia setup](#inferentia-setup)
+  - [Setting up the Inferentia model](#setting-up-the-inferentia-model)
+    - [PyTorch](#pytorch)
+    - [TensorFlow](#tensorflow)
+  - [Serving Inferentia model in Triton](#serving-inferentia-model-in-triton)
+  - [Testing Inferentia Setup for Accuracy](#testing-inferentia-setup-for-accuracy)
+
 ## Inferentia setup
 
 First step of running Triton with Inferentia is to create an AWS Inferentia
@@ -50,17 +61,18 @@ Clone this repo with Github to home repo `/home/ubuntu`.
 Ensure that the neuron runtime 1.0 demon (neuron-rtd) is not running and set up
 and install neuron 2.X runtime builds with
 ```
- sudo ./python_backend/setup-pre-container.sh
+ $chmod 777 /home/ubuntu/python_backend/inferentia/scripts/setup-pre-container.sh
+ $sudo /home/ubuntu/python_backend/inferentia/scripts/setup-pre-container.sh
 ```
 
 Then, start the Triton instance with:
 ``` 
-docker run --device /dev/neuron0 <more neuron devices> -v /home/ubuntu/python_backend:/home/ubuntu/python_backend -v /lib/udev:/mylib/udev --shm-size=1g -e "AWS_NEURON_VISIBLE_DEVICES=ALL" --ulimit memlock=-1 -p 8000:8000 -p 8001:8001 -p 8002:8002 --ulimit stack=67108864 -ti nvcr.io/nvidia/tritonserver:<xx.yy>-py3
+ $docker run --device /dev/neuron0 <more neuron devices> -v /home/ubuntu/python_backend:/home/ubuntu/python_backend -v /lib/udev:/mylib/udev --shm-size=1g --ulimit memlock=-1 -p 8000:8000 -p 8001:8001 -p 8002:8002 --ulimit stack=67108864 -ti nvcr.io/nvidia/tritonserver:<xx.yy>-py3
 ```
 Note 1: The user would need to list any neuron device to run during container initialization.
 For example, to use 4 neuron devices on an instance, the user would need to run with:
 ```
-docker run --device /dev/neuron0 --device /dev/neuron1 --device /dev/neuron2 --device /dev/neuron3 ...`
+ $docker run --device /dev/neuron0 --device /dev/neuron1 --device /dev/neuron2 --device /dev/neuron3 ...`
 ```
 Note 2: `/mylib/udev` is used for Neuron parameter passing. 
 
@@ -70,7 +82,7 @@ Note 3: For Triton container version xx.yy, please refer to
 
 After starting the Triton container, go into the `python_backend` folder and run the setup script.
 ```
-source /home/ubuntu/python_backend/inferentia/scripts/setup .sh
+ $source /home/ubuntu/python_backend/inferentia/scripts/setup.sh
 ```
 This script will:
 1. Setup miniconda enviroment
@@ -84,7 +96,7 @@ There are user configurable options available for the script as well.
 For example, to control the python version for the python environment to 3.6, 
 you can run:
 ```
-source /home/ubuntu/python_backend/inferentia/scripts/setup.sh -v 3.6
+ $source /home/ubuntu/python_backend/inferentia/scripts/setup.sh -v 3.6
 ```
 Please use the `-h` or `--help` options to learn about more configurable options.
 
@@ -94,6 +106,15 @@ Currently, we only support [PyTorch](https://awsdocs-neuron.readthedocs-hosted.c
 and [TensorFlow](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/neuron-guide/neuron-frameworks/tensorflow-neuron/index.html)
 workflows for execution on inferentia. 
 
+The user is required to create their own `*.pt` (for pytorch) or `*.savedmodels` (for tensorflow) models. This is 
+a critical step since Inferentia will need the underlying `.NEFF` graph to execute
+the inference request. Please refer to: 
+- [Neuron compiler CLI Reference Guide](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/neuron-guide/neuron-cc/command-line-reference.html)
+- [PyTorch-Neuron trace python API](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/neuron-guide/neuron-frameworks/pytorch-neuron/api-compilation-python-api.html)
+- [PyTorch Tutorials](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/neuron-guide/neuron-frameworks/pytorch-neuron/tutorials/index.html) 
+- [TensorFlow Tutorials](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/neuron-guide/neuron-frameworks/tensorflow-neuron/tutorials/index.html)
+  
+for guidance on how to compile models.
 ### PyTorch
 
 For PyTorch, we support models traced by [PyTorch-Neuron trace python API](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/neuron-guide/neuron-frameworks/pytorch-neuron/api-compilation-python-api.html)
@@ -195,11 +216,32 @@ a valid torchscript file or tensorflow savedmodel.
 Now, the server can be launched with the model as below:
 
 ```
-tritonserver --model-repository <path_to_model_repository>
+ $tritonserver --model-repository <path_to_model_repository>
 ```
 
 Note: 
 1. The `config.pbtxt` and `model.py` should be treated as
 starting point. The users can customize these files as per
 their need.
-2. Triton Inferentia is currently tested with a **single** model. 
+2. Triton Inferentia is currently tested with a **single** model. 
+
+## Testing Inferentia Setup for Accuracy
+The [qa folder](https://github.com/triton-inference-server/python_backend/tree/main/inferentia/qa)
+contains the necessary files to set up testing with a simple add_sub model. The test
+requires an instance with more than 8 inferentia cores to run, eg:`inf1.6xlarge`.
+start the test, run 
+```
+ $source <triton path>/python_backend/inferentia/qa/setup_test_enviroment_and_test.sh
+``` 
+where `<triton path>` is usually `/home/ubuntu`/.
+This script will pull the [server repo](https://github.com/triton-inference-server/server)
+that contains the tests for inferentia. It will then build the most recent 
+Triton Server and Triton SDK. 
+
+Note: If you would need to change some of the tests in the server repo,
+you would need to run 
+```
+ $export TRITON_SERVER_BRANCH_NAME=<your branch name>
+``` 
+before running the script. 
+
@@ -0,0 +1,82 @@
+# Copyright 2021, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+#
+# Redistribution and use in source and binary forms, with or without
+# modification, are permitted provided that the following conditions
+# are met:
+#  * Redistributions of source code must retain the above copyright
+#    notice, this list of conditions and the following disclaimer.
+#  * Redistributions in binary form must reproduce the above copyright
+#    notice, this list of conditions and the following disclaimer in the
+#    documentation and/or other materials provided with the distribution.
+#  * Neither the name of NVIDIA CORPORATION nor the names of its
+#    contributors may be used to endorse or promote products derived
+#    from this software without specific prior written permission.
+#
+# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS ``AS IS'' AND ANY
+# EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+# IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
+# PURPOSE ARE DISCLAIMED.  IN NO EVENT SHALL THE COPYRIGHT OWNER OR
+# CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
+# EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
+# PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
+# PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY
+# OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+#
+# Multistage build.
+#
+ARG BASE_IMAGE=tritonserver
+ARG BUILD_IMAGE=tritonserver_build
+ARG SDK_IMAGE=tritonserver_sdk
+ARG TRITON_PATH=/home/ubuntu
+
+FROM ${SDK_IMAGE} AS sdk
+FROM $BASE_IMAGE
+# Ensure apt-get won't prompt for selecting options
+ENV DEBIAN_FRONTEND=noninteractive
+# install platform specific packages
+RUN if [ $(cat /etc/os-release | grep 'VERSION_ID="20.04"' | wc -l) -ne 0 ]; then \
+        apt-get update && \
+        apt-get install -y --no-install-recommends \
+                libpng-dev; \
+    elif [ $(cat /etc/os-release | grep 'VERSION_ID="18.04"' | wc -l) -ne 0 ]; then \
+        apt-get update && \
+        apt-get install -y --no-install-recommends \
+                libpng-dev; \
+    else \
+        echo "Ubuntu version must be either 18.04 or 20.04" && \
+        exit 1; \
+    fi
+
+RUN apt-get update && apt-get install -y --no-install-recommends \
+                              python3-dev \
+                              python3-pip \
+                              build-essential \
+                              wget && \
+    rm -rf /var/lib/apt/lists/*
+
+RUN rm -f /usr/bin/python && \
+    ln -s /usr/bin/python3 /usr/bin/python
+
+RUN pip3 install --upgrade wheel setuptools && \
+    pip3 install --upgrade numpy pillow attrdict future grpcio requests gsutil awscli six grpcio-channelz
+
+WORKDIR /opt/tritonserver
+# Copy the entire qa repo to the /opt/tritonserver/qa repo
+COPY --from=tritonserver_build /workspace/qa qa
+COPY --chown=1000:1000 --from=sdk /workspace/install client_tmp
+RUN mkdir -p qa/clients && mkdir -p qa/pkgs && \
+    cp -a client_tmp/bin/* qa/clients/. && \
+    cp client_tmp/lib/libgrpcclient.so qa/clients/. && \
+    cp client_tmp/lib/libhttpclient.so qa/clients/. && \
+    cp client_tmp/python/*.py qa/clients/. && \
+    cp client_tmp/python/triton*.whl qa/pkgs/. && \
+    cp client_tmp/java/examples/*.jar qa/clients/. && \
+    rm -rf client_tmp
+# Create mount paths for lib
+RUN mkdir /mylib && mkdir /home/ubuntu
+
+ENV TRITON_PATH ${TRITON_PATH}
+ENV LD_LIBRARY_PATH /opt/tritonserver/qa/clients:${LD_LIBRARY_PATH}
@@ -0,0 +1,116 @@
+#!/bin/bash
+# Copyright (c) 2021, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+#
+# Redistribution and use in source and binary forms, with or without
+# modification, are permitted provided that the following conditions
+# are met:
+#  * Redistributions of source code must retain the above copyright
+#    notice, this list of conditions and the following disclaimer.
+#  * Redistributions in binary form must reproduce the above copyright
+#    notice, this list of conditions and the following disclaimer in the
+#    documentation and/or other materials provided with the distribution.
+#  * Neither the name of NVIDIA CORPORATION nor the names of its
+#    contributors may be used to endorse or promote products derived
+#    from this software without specific prior written permission.
+#
+# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS ``AS IS'' AND ANY
+# EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+# IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
+# PURPOSE ARE DISCLAIMED.  IN NO EVENT SHALL THE COPYRIGHT OWNER OR
+# CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
+# EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
+# PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
+# PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY
+# OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+export TRITON_PATH="/home/ubuntu"
+export DEFAULT_REPO_TAG="main"
+export TRITON_COMMON_REPO_TAG=${DEFAULT_REPO_TAG}
+export TRITON_CORE_REPO_TAG=${DEFAULT_REPO_TAG}
+export TRITON_BACKEND_REPO_TAG=${DEFAULT_REPO_TAG}
+export TRITON_THIRD_PARTY_REPO_TAG=${DEFAULT_REPO_TAG}
+export IDENTITY_BACKEND_REPO_TAG=${DEFAULT_REPO_TAG}
+export PYTHON_BACKEND_REPO_TAG=${DEFAULT_REPO_TAG}
+export CHECKSUM_REPOAGENT_REPO_TAG=${DEFAULT_REPO_TAG}
+export TRITON_SERVER_BRANCH_NAME=${TRITON_SERVER_BRANCH_NAME:=${DEFAULT_REPO_TAG}}
+export TRITON_CLIENT_REPO_TAG=${TRITON_CLIENT_REPO_TAG:=${DEFAULT_REPO_TAG}}
+export TRITON_VERSION="2.17.0dev"
+export TRITON_CONTAINER_VERSION="21.12dev"
+export TRITON_UPSTREAM_CONTAINER_VERSION="21.10"
+export BASE_IMAGE=tritonserver
+export SDK_IMAGE=tritonserver_sdk
+export BUILD_IMAGE=tritonserver_build
+export QA_IMAGE=tritonserver_qa
+
+cd ${TRITON_PATH}
+# Clone necessary branches
+rm -rf ${TRITON_PATH}/server
+git clone --single-branch --depth=1 -b ${TRITON_SERVER_BRANCH_NAME} \
+          https://github.com/triton-inference-server/server.git
+echo ${TRITON_VERSION} > server/TRITON_VERSION
+cd ${TRITON_PATH}/server
+git clone --single-branch --depth=1 -b ${TRITON_CLIENT_REPO_TAG} \
+          https://github.com/triton-inference-server/client.git clientrepo
+
+# First set up inferentia and run in detatched mode
+cd ${TRITON_PATH}/python_backend
+chmod 777 ${TRITON_PATH}/python_backend/inferentia/scripts/setup-pre-container.sh
+sudo ${TRITON_PATH}/python_backend/inferentia/scripts/setup-pre-container.sh
+
+# Build container with only python backend 
+cd ${TRITON_PATH}/server
+pip3 install docker
+./build.py --build-dir=/tmp/tritonbuild \
+           --cmake-dir=${TRITON_PATH}/server/build \
+           --version=${TRITON_VERSION} \
+           --container-version=${TRITON_CONTAINER_VERSION} \
+           --enable-logging --enable-stats --enable-tracing \
+           --enable-metrics --enable-gpu-metrics --enable-gpu \
+           --filesystem=gcs --filesystem=azure_storage --filesystem=s3 \
+           --endpoint=http --endpoint=grpc \
+           --repo-tag=common:${TRITON_COMMON_REPO_TAG} \
+           --repo-tag=core:${TRITON_CORE_REPO_TAG} \
+           --repo-tag=backend:${TRITON_BACKEND_REPO_TAG} \
+           --repo-tag=thirdparty:${TRITON_THIRD_PARTY_REPO_TAG} \
+           --backend=identity:${IDENTITY_BACKEND_REPO_TAG} \
+           --backend=python:${PYTHON_BACKEND_REPO_TAG} \
+           --repoagent=checksum:${CHECKSUM_REPOAGENT_REPO_TAG}
+docker tag tritonserver_build "${BUILD_IMAGE}"
+docker tag tritonserver "${BASE_IMAGE}"
+
+# Build docker container for SDK
+docker build -t ${SDK_IMAGE} \
+             -f ${TRITON_PATH}/server/Dockerfile.sdk \
+             --build-arg "TRITON_CLIENT_REPO_SUBDIR=clientrepo" .
+
+# Build QA container
+docker build -t ${QA_IMAGE} \
+                   -f ${TRITON_PATH}/python_backend/inferentia/qa/Dockerfile.QA \
+                   --build-arg "TRITON_PATH=${TRITON_PATH}" \
+                   --build-arg "BASE_IMAGE=${BASE_IMAGE}"   \
+                   --build-arg "BUILD_IMAGE=${BUILD_IMAGE}" \
+                   --build-arg "SDK_IMAGE=${SDK_IMAGE}"     .
+
+export TEST_JSON_REPO=/opt/tritonserver/qa/common/inferentia_perf_analyzer_input_data_json
+export TEST_REPO=/opt/tritonserver/qa/L0_inferentia_perf_analyzer
+export TEST_SCRIPT="test.sh"
+
+# Run single instance test
+CONTAINER_NAME="qa_container"
+docker stop ${CONTAINER_NAME} && docker rm ${CONTAINER_NAME}
+docker create --name ${CONTAINER_NAME}             \
+            --device /dev/neuron0                  \
+            --device /dev/neuron1                  \
+            --shm-size=1g --ulimit memlock=-1      \
+            -p 8000:8000 -p 8001:8001 -p 8002:8002 \
+            --ulimit stack=67108864                \
+            -e TEST_REPO=${TEST_REPO}              \
+            -e TEST_JSON_REPO=${TEST_JSON_REPO}    \
+            -e TRITON_PATH=${TRITON_PATH}          \
+            --net host -ti ${QA_IMAGE}             \
+            /bin/bash -c "bash -ex ${TEST_REPO}/${TEST_SCRIPT}" && \
+            docker cp /lib/udev ${CONTAINER_NAME}:/mylib/udev && \
+            docker cp /home/ubuntu/python_backend ${CONTAINER_NAME}:${TRITON_PATH}/python_backend && \
+            docker start -a ${CONTAINER_NAME} || RV=$?;
@@ -1,4 +1,4 @@
-#/bin/bash
+#!/bin/bash
 # Copyright (c) 2021, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
 #
 # Redistribution and use in source and binary forms, with or without
@@ -28,16 +28,16 @@ cd /home/ubuntu
 
 # First stop and remove old neuron 1.X runtime
 sudo systemctl stop neuron-rtd
-sudo apt remove aws-neuron-runtime 
+sudo apt remove aws-neuron-runtime -y 
 
 # Then install new neuron libraries
 . /etc/os-release
 sudo tee /etc/apt/sources.list.d/neuron.list > /dev/null <<EOF
 deb https://apt.repos.neuron.amazonaws.com ${VERSION_CODENAME} main
 EOF
 sudo wget -qO - https://apt.repos.neuron.amazonaws.com/GPG-PUB-KEY-AMAZON-AWS-NEURON.PUB |  apt-key add -
-sudo apt-get update
-sudo apt-get install -y             \
+sudo apt-get update &&          \
+    sudo apt-get install -y        \
         linux-headers-$(uname -r)  \
         aws-neuron-dkms            \
         aws-neuron-tools
 
@@ -41,7 +41,7 @@ Sets up python execution environment for AWS Neuron SDK for execution on Inferen
 OPTS=$(getopt -o hb:v:i:tp --long help,python-backend-path:,python-version:,inferentia-path:,use-tensorflow,use-pytorch -- "$@")
 
 
-export INFRENTIA_PATH="/home/ubuntu"
+export INFRENTIA_PATH=${TRITON_PATH:="/home/ubuntu"}
 export PYTHON_BACKEND_PATH="/home/ubuntu/python_backend"
 export PYTHON_VERSION=3.7
 export USE_PYTORCH=0