mattaltberg
diff --git a/‎README.md‎
Lines changed: 110 additions & 3 deletions b/‎README.md‎
Lines changed: 110 additions & 3 deletions
diff --git a/‎examples/pytorch_platform_handler/README.md‎
Lines changed: 109 additions & 0 deletions b/‎examples/pytorch_platform_handler/README.md‎
Lines changed: 109 additions & 0 deletions
diff --git a/‎examples/pytorch_platform_handler/client.py‎
Lines changed: 92 additions & 0 deletions b/‎examples/pytorch_platform_handler/client.py‎
Lines changed: 92 additions & 0 deletions
diff --git a/‎examples/pytorch_platform_handler/config.pbtxt‎
Lines changed: 45 additions & 0 deletions b/‎examples/pytorch_platform_handler/config.pbtxt‎
Lines changed: 45 additions & 0 deletions
@@ -72,6 +72,7 @@ any C++ code.
   - [Input Tensor Device Placement](#input-tensor-device-placement)
 - [Frameworks](#frameworks)
   - [PyTorch](#pytorch)
+    - [PyTorch Platform \[Experimental\]](#pytorch-platform-experimental)
     - [PyTorch Determinism](#pytorch-determinism)
   - [TensorFlow](#tensorflow)
     - [TensorFlow Determinism](#tensorflow-determinism)
@@ -1397,9 +1398,115 @@ this workflow.
 For a simple example of using PyTorch in a Python Backend model, see the
 [AddSubNet PyTorch example](#addsubnet-in-pytorch).
 
-PyTorch models may be served directly without implementing the `model.py`, see
-[Serving PyTorch models using Python Backend \[Experimental\]](src/resources/platform_handlers/pytorch/README.md)
-for more details.
+### PyTorch Platform \[Experimental\]
+
+**NOTE**: *This feature is subject to change and removal, and should not
+be used in production.*
+
+Starting from 23.08, we are adding an experimental support for loading and
+serving PyTorch models directly via Python backend. The model can be provided
+within the triton server model repository, and a
+[pre-built Python model](src/resources/platform_handlers/pytorch/model.py) will
+be used to load and serve the PyTorch model.
+
+#### Model Layout
+
+The model repository should look like:
+
+```
+model_repository/
+`-- model_directory
+    |-- 1
+    |   |-- model.py
+    |   `-- model.pt
+    `-- config.pbtxt
+```
+
+The `model.py` contains the class definition of the PyTorch model. The class
+should extend the
+[`torch.nn.Module`](https://pytorch.org/docs/stable/generated/torch.nn.Module.html#torch.nn.Module).
+The `model.pt` may be optionally provided which contains the saved
+[`state_dict`](https://pytorch.org/tutorials/beginner/saving_loading_models.html#saving-loading-model-for-inference)
+of the model. For serving TorchScript models, a `model.pt` TorchScript can be
+provided in place of the `model.py` file.
+
+By default, Triton will use the
+[PyTorch backend](https://github.com/triton-inference-server/pytorch_backend) to
+load and serve TorchScript models. In order to serve from Python backend,
+[model configuration](https://github.com/triton-inference-server/server/blob/main/docs/user_guide/model_configuration.md)
+should explicitly provide the following settings:
+
+```
+backend: "python"
+platform: "pytorch"
+```
+
+#### PyTorch Installation
+
+This feature will take advantage of the
+[`torch.compile`](https://pytorch.org/docs/stable/generated/torch.compile.html#torch-compile)
+optimization, make sure the
+[PyTorch 2.0+ pip package](https://pypi.org/project/torch/2.0.1/) is available
+in the same Python environment.
+
+```
+pip install torch==2.0.1
+```
+Alternatively, a
+[Python Execution Environment](#using-custom-python-execution-environments)
+with the PyTorch dependency may be used.
+
+#### Customization
+
+The following PyTorch settings may be customized by setting parameters on the
+`config.pbtxt`.
+
+[`torch.set_num_threads(int)`](https://pytorch.org/docs/stable/generated/torch.set_num_threads.html#torch.set_num_threads)
+- Key: NUM_THREADS
+- Value: The number of threads used for intraop parallelism on CPU.
+
+[`torch.set_num_interop_threads(int)`](https://pytorch.org/docs/stable/generated/torch.set_num_interop_threads.html#torch.set_num_interop_threads)
+- Key: NUM_INTEROP_THREADS
+- Value: The number of threads used for interop parallelism (e.g. in JIT
+interpreter) on CPU.
+
+[`torch.compile()` parameters](https://pytorch.org/docs/stable/generated/torch.compile.html#torch-compile)
+- Key: TORCH_COMPILE_OPTIONAL_PARAMETERS
+- Value: Any of following parameter(s) encoded as a JSON object.
+  - fullgraph (*bool*): Whether it is ok to break model into several subgraphs.
+  - dynamic (*bool*): Use dynamic shape tracing.
+  - backend (*str*): The backend to be used.
+  - mode (*str*): Can be either "default", "reduce-overhead" or "max-autotune".
+  - options (*dict*): A dictionary of options to pass to the backend.
+  - disable (*bool*): Turn `torch.compile()` into a no-op for testing.
+
+For example:
+```
+parameters: {
+    key: "NUM_THREADS"
+    value: { string_value: "4" }
+}
+parameters: {
+    key: "TORCH_COMPILE_OPTIONAL_PARAMETERS"
+    value: { string_value: "{\"disable\": true}" }
+}
+```
+
+#### Example
+
+You can find the complete example instructions in
+[examples/pytorch_platform_handler](examples/pytorch_platform_handler/README.md).
+
+#### Limitations
+
+Following are few known limitations of this feature:
+- Python functions optimizable by `torch.compile` may not be served directly in
+the `model.py` file, they need to be enclosed by a class extending the
+[`torch.nn.Module`](https://pytorch.org/docs/stable/generated/torch.nn.Module.html#torch.nn.Module).
+- Model weights cannot be shared across multiple instances on the same GPU
+device.
+- When using `KIND_MODEL` as model instance kind, the default device of the
+first parameter on the model is used.
 
 ### PyTorch Determinism
 
 
@@ -0,0 +1,109 @@
+<!--
+# Copyright 2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+#
+# Redistribution and use in source and binary forms, with or without
+# modification, are permitted provided that the following conditions
+# are met:
+#  * Redistributions of source code must retain the above copyright
+#    notice, this list of conditions and the following disclaimer.
+#  * Redistributions in binary form must reproduce the above copyright
+#    notice, this list of conditions and the following disclaimer in the
+#    documentation and/or other materials provided with the distribution.
+#  * Neither the name of NVIDIA CORPORATION nor the names of its
+#    contributors may be used to endorse or promote products derived
+#    from this software without specific prior written permission.
+#
+# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS ``AS IS'' AND ANY
+# EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+# IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
+# PURPOSE ARE DISCLAIMED.  IN NO EVENT SHALL THE COPYRIGHT OWNER OR
+# CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
+# EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
+# PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
+# PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY
+# OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+-->
+
+# PyTorch Example
+
+In this section, we demonstrate an end-to-end example for using the
+[PyTorch Platform \[Experimental\]](../../README.md#pytorch-platform-experimental)
+to serve a PyTorch model directly, **without** needing to implement the
+`TritonPythonModel` class.
+
+## Create a ResNet50 model repository
+
+We will use the files that come with this example to create the model
+repository.
+
+First, download [client.py](client.py), [config.pbtxt](config.pbtxt),
+[model.py](model.py),
+[mug.jpg](https://raw.githubusercontent.com/triton-inference-server/server/main/qa/images/mug.jpg)
+and [resnet50_labels.txt](resnet50_labels.txt) to your local machine.
+
+Next, at the directory where the downloaded files are saved at, create a model
+repository with the following commands:
+```
+$ mkdir -p models/resnet50_pytorch/1
+$ mv model.py models/resnet50_pytorch/1
+$ mv config.pbtxt models/resnet50_pytorch
+```
+
+## Pull the Triton Docker images
+
+We need to install Docker and NVIDIA Container Toolkit before proceeding, refer
+to the
+[installation steps](https://github.com/triton-inference-server/server/tree/main/docs#installation).
+
+To pull the latest containers, run the following commands:
+```
+$ docker pull nvcr.io/nvidia/tritonserver:<yy.mm>-py3
+$ docker pull nvcr.io/nvidia/tritonserver:<yy.mm>-py3-sdk
+```
+See the installation steps above for the `<yy.mm>` version.
+
+For example, if the version is `23.08`, then:
+```
+$ docker pull nvcr.io/nvidia/tritonserver:23.08-py3
+$ docker pull nvcr.io/nvidia/tritonserver:23.08-py3-sdk
+```
+
+Be sure to replace the `<yy.mm>` with the version pulled for all the remaining
+parts of this example.
+
+## Start the Triton Server
+
+At the directory where we created the PyTorch model (at where the "models"
+folder is located), run the following command:
+```
+$ docker run -it --rm --gpus all --shm-size 1g -p 8000:8000 -v `pwd`:/pytorch_example nvcr.io/nvidia/tritonserver:<yy.mm>-py3 /bin/bash
+```
+
+Inside the container, we need to install PyTorch, Pillow and Requests to run this example.
+We recommend using `pip` method for the installations, for example:
+```
+$ pip3 install torch Pillow requests
+```
+
+Finally, we need to start the Triton Server, run the following command:
+```
+$ tritonserver --model-repository=/pytorch_example/models
+```
+
+To leave the container for the next step, press: `CTRL + P + Q`.
+
+## Test inference
+
+At the directory where the client.py is located, run the following command:
+```
+$ docker run --rm --net=host -v `pwd`:/pytorch_example nvcr.io/nvidia/tritonserver:<yy.mm>-py3-sdk python3 /pytorch_example/client.py
+```
+
+A successful inference will print the following at the end:
+```
+Result: COFFEE MUG
+Expected result: COFFEE MUG
+PASS: PyTorch platform handler
+```
@@ -0,0 +1,92 @@
+#!/usr/bin/env python3
+
+# Copyright 2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+#
+# Redistribution and use in source and binary forms, with or without
+# modification, are permitted provided that the following conditions
+# are met:
+#  * Redistributions of source code must retain the above copyright
+#    notice, this list of conditions and the following disclaimer.
+#  * Redistributions in binary form must reproduce the above copyright
+#    notice, this list of conditions and the following disclaimer in the
+#    documentation and/or other materials provided with the distribution.
+#  * Neither the name of NVIDIA CORPORATION nor the names of its
+#    contributors may be used to endorse or promote products derived
+#    from this software without specific prior written permission.
+#
+# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS ``AS IS'' AND ANY
+# EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+# IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
+# PURPOSE ARE DISCLAIMED.  IN NO EVENT SHALL THE COPYRIGHT OWNER OR
+# CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
+# EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
+# PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
+# PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY
+# OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+import os
+import sys
+
+import numpy as np
+from PIL import Image
+from tritonclient import http as httpclient
+from tritonclient.utils import *
+
+script_directory = os.path.dirname(os.path.realpath(__file__))
+
+server_url = "localhost:8000"
+model_name = "resnet50_pytorch"
+input_name = "INPUT"
+output_name = "OUTPUT"
+label_path = os.path.join(script_directory, "resnet50_labels.txt")
+# The 'mug.jpg' image will be present at the script_directory if the steps on
+# the provided README.md are followed. The image may also be found at
+# '/workspace/images/mug.jpg' on the SDK container or
+# '/opt/tritonserver/qa/images/mug.jpg' on the QA container.
+image_path = os.path.join(script_directory, "mug.jpg")
+expected_output_class = "COFFEE MUG"
+
+
+def _load_input_image():
+    raw_image = Image.open(image_path)
+    raw_image = raw_image.convert("RGB").resize((224, 224), Image.BILINEAR)
+    input_image = np.array(raw_image).astype(np.float32)
+    input_image = (input_image / 127.5) - 1
+    input_image = np.transpose(input_image, (2, 0, 1))
+    input_image = np.reshape(input_image, (1, 3, 224, 224))
+    return input_image
+
+
+def _infer(input_image):
+    with httpclient.InferenceServerClient(server_url) as client:
+        input_tensors = httpclient.InferInput(input_name, input_image.shape, "FP32")
+        input_tensors.set_data_from_numpy(input_image)
+        results = client.infer(model_name=model_name, inputs=[input_tensors])
+        output_tensors = results.as_numpy(output_name)
+    return output_tensors
+
+
+def _check_output(output_tensors):
+    with open(label_path) as f:
+        labels_dict = {idx: line.strip() for idx, line in enumerate(f)}
+    max_id = np.argmax(output_tensors, axis=1)[0]
+    output_class = labels_dict[max_id]
+    print("Result: " + output_class)
+    print("Expected result: " + expected_output_class)
+    if output_class != expected_output_class:
+        return False
+    return True
+
+
+if __name__ == "__main__":
+    input_image = _load_input_image()
+    output_tensors = _infer(input_image)
+    result_valid = _check_output(output_tensors)
+
+    if not result_valid:
+        print("PyTorch platform handler example error: Unexpected result")
+        sys.exit(1)
+
+    print("PASS: PyTorch platform handler")
@@ -0,0 +1,45 @@
+# Copyright 2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+#
+# Redistribution and use in source and binary forms, with or without
+# modification, are permitted provided that the following conditions
+# are met:
+#  * Redistributions of source code must retain the above copyright
+#    notice, this list of conditions and the following disclaimer.
+#  * Redistributions in binary form must reproduce the above copyright
+#    notice, this list of conditions and the following disclaimer in the
+#    documentation and/or other materials provided with the distribution.
+#  * Neither the name of NVIDIA CORPORATION nor the names of its
+#    contributors may be used to endorse or promote products derived
+#    from this software without specific prior written permission.
+#
+# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS ``AS IS'' AND ANY
+# EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+# IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
+# PURPOSE ARE DISCLAIMED.  IN NO EVENT SHALL THE COPYRIGHT OWNER OR
+# CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
+# EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
+# PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
+# PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY
+# OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+name: "resnet50_pytorch"
+backend: "python"
+platform: "pytorch"
+
+max_batch_size: 128
+
+input {
+  name: "INPUT"
+  data_type: TYPE_FP32
+  format: FORMAT_NCHW
+  dims: [ 3, 224, 224 ]
+}
+output {
+  name: "OUTPUT"
+  data_type: TYPE_FP32
+  dims: [ 1000 ]
+}
+
+instance_group [{ kind: KIND_CPU }]