@@ -45,6 +45,7 @@ any C++ code.
4545* [ Managing Shared Memory] ( #managing-shared-memory )
4646* [ Building From Source] ( #building-from-source )
4747* [ Business Logic Scripting (beta)] ( #business-logic-scripting-beta )
48+ * [ Interoperability and GPU Support] ( #interoperability-and-gpu-support )
4849
4950## Quick Start
5051
@@ -608,6 +609,76 @@ do not create a circular dependency. For example, if model A performs an inferen
608609on itself and there are no more model instances ready to execute the inference request, the
609610model will block on the inference execution forever.
610611
612+ # Interoperability and GPU Support
613+
614+ Starting from 21.09 release, Python backend supports
615+ [ DLPack] ( https://github.com/dmlc/dlpack ) for zero-copy transfer of Python
616+ backend tensors to other frameworks. The methods below are added to the
617+ ` pb_utils.Tensor ` object to facilitate the same:
618+
619+ ## ` pb_utils.Tensor.to_dlpack() -> PyCapsule `
620+
621+ This method can be called on existing instantiated tensors to convert
622+ a Tensor to DLPack. The code snippet below shows how this works with PyTorch:
623+
624+ ``` python
625+ from torch.utils.dlpack import from_dlpack
626+ import triton_python_backend_utils as pb_utils
627+
628+ class TritonPythonModel :
629+
630+ def execute (self , requests ):
631+ ...
632+ input0 = pb_utils.get_input_tensor_by_name(request, " INPUT0" )
633+
634+ # We have converted a Python backend tensor to a PyTorch tensor without
635+ # making any copies.
636+ pytorch_tensor = from_dlpack(input0.to_dlpack())
637+ ```
638+
639+ ## ` pb_utils.Tensor.from_dlpack() -> Tensor `
640+
641+ This static method can be used for creating a ` Tensor ` object from the DLPack
642+ encoding of the tensor. For example:
643+
644+ ``` python
645+ from torch.utils.dlpack import to_dlpack
646+ import torch
647+ import triton_python_backend_utils as pb_utils
648+
649+ class TritonPythonModel :
650+
651+ def execute (self , requests ):
652+ ...
653+ pytorch_tensor = torch.tensor([1 , 2 , 3 ], device = ' cuda' )
654+
655+ # Create a Python backend tensor from the DLPack encoding of a PyTorch
656+ # tensor.
657+ input0 = pb_utils.Tensor.from_dlpack(to_dlpack(pytorch_tensor))
658+ ```
659+
660+ This method only supports contiguous Tensors that are in C-order. If the tensor
661+ is not C-order contiguous an exception will be raised.
662+
663+ ## ` pb_utils.Tensor.is_cpu() -> bool `
664+
665+ This function can be used to check whether a tensor is placed in CPU or not.
666+
667+ ## Controlling Input Tensor Device Placement
668+
669+ By default Python backend moves all the input tensors to CPU. Starting from
670+ 21.09 release, you can control whether you want to move input tensors to CPU or
671+ let Triton decide the placement of the input tensors. If you let Triton decide
672+ the placement of input tensors, your Python model must be able to handle tensors
673+ that are in CPU or GPU. You can control this using the
674+ ` FORCE_CPU_ONLY_INPUT_TENSORS ` setting in your Python model configuration. The
675+ default value for this parameter is "yes". By adding the line below to your
676+ model config, you are letting Triton decide the placement of input Tensors:
677+
678+ ```
679+ parameters: { key: "FORCE_CPU_ONLY_INPUT_TENSORS" value: {string_value:"no"}}
680+ ```
681+
611682# Examples
612683
613684For using the Triton Python client in these examples you need to install
@@ -616,7 +687,7 @@ The Python client for each of the examples is in the `client.py` file.
616687
617688## AddSub in NumPy
618689
619- There is no dependencies required for the AddSub numpy example. Instructions
690+ There is no dependencies required for the AddSub NumPy example. Instructions
620691on how to use this model is explained in the quick start section. You can
621692find the files in [ examples/add_sub] ( examples/add_sub ) .
622693
0 commit comments