Skip to content

Commit e768478

Browse files
authored
Add documentation for DLPack tensor (triton-inference-server#80)
* Add documentation for DLPack tensor * Review edits
1 parent 9873850 commit e768478

File tree

1 file changed

+72
-1
lines changed

1 file changed

+72
-1
lines changed

README.md

Lines changed: 72 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -45,6 +45,7 @@ any C++ code.
4545
* [Managing Shared Memory](#managing-shared-memory)
4646
* [Building From Source](#building-from-source)
4747
* [Business Logic Scripting (beta)](#business-logic-scripting-beta)
48+
* [Interoperability and GPU Support](#interoperability-and-gpu-support)
4849

4950
## Quick Start
5051

@@ -608,6 +609,76 @@ do not create a circular dependency. For example, if model A performs an inferen
608609
on itself and there are no more model instances ready to execute the inference request, the
609610
model will block on the inference execution forever.
610611

612+
# Interoperability and GPU Support
613+
614+
Starting from 21.09 release, Python backend supports
615+
[DLPack](https://github.com/dmlc/dlpack) for zero-copy transfer of Python
616+
backend tensors to other frameworks. The methods below are added to the
617+
`pb_utils.Tensor` object to facilitate the same:
618+
619+
## `pb_utils.Tensor.to_dlpack() -> PyCapsule`
620+
621+
This method can be called on existing instantiated tensors to convert
622+
a Tensor to DLPack. The code snippet below shows how this works with PyTorch:
623+
624+
```python
625+
from torch.utils.dlpack import from_dlpack
626+
import triton_python_backend_utils as pb_utils
627+
628+
class TritonPythonModel:
629+
630+
def execute(self, requests):
631+
...
632+
input0 = pb_utils.get_input_tensor_by_name(request, "INPUT0")
633+
634+
# We have converted a Python backend tensor to a PyTorch tensor without
635+
# making any copies.
636+
pytorch_tensor = from_dlpack(input0.to_dlpack())
637+
```
638+
639+
## `pb_utils.Tensor.from_dlpack() -> Tensor`
640+
641+
This static method can be used for creating a `Tensor` object from the DLPack
642+
encoding of the tensor. For example:
643+
644+
```python
645+
from torch.utils.dlpack import to_dlpack
646+
import torch
647+
import triton_python_backend_utils as pb_utils
648+
649+
class TritonPythonModel:
650+
651+
def execute(self, requests):
652+
...
653+
pytorch_tensor = torch.tensor([1, 2, 3], device='cuda')
654+
655+
# Create a Python backend tensor from the DLPack encoding of a PyTorch
656+
# tensor.
657+
input0 = pb_utils.Tensor.from_dlpack(to_dlpack(pytorch_tensor))
658+
```
659+
660+
This method only supports contiguous Tensors that are in C-order. If the tensor
661+
is not C-order contiguous an exception will be raised.
662+
663+
## `pb_utils.Tensor.is_cpu() -> bool`
664+
665+
This function can be used to check whether a tensor is placed in CPU or not.
666+
667+
## Controlling Input Tensor Device Placement
668+
669+
By default Python backend moves all the input tensors to CPU. Starting from
670+
21.09 release, you can control whether you want to move input tensors to CPU or
671+
let Triton decide the placement of the input tensors. If you let Triton decide
672+
the placement of input tensors, your Python model must be able to handle tensors
673+
that are in CPU or GPU. You can control this using the
674+
`FORCE_CPU_ONLY_INPUT_TENSORS` setting in your Python model configuration. The
675+
default value for this parameter is "yes". By adding the line below to your
676+
model config, you are letting Triton decide the placement of input Tensors:
677+
678+
```
679+
parameters: { key: "FORCE_CPU_ONLY_INPUT_TENSORS" value: {string_value:"no"}}
680+
```
681+
611682
# Examples
612683

613684
For using the Triton Python client in these examples you need to install
@@ -616,7 +687,7 @@ The Python client for each of the examples is in the `client.py` file.
616687

617688
## AddSub in NumPy
618689

619-
There is no dependencies required for the AddSub numpy example. Instructions
690+
There is no dependencies required for the AddSub NumPy example. Instructions
620691
on how to use this model is explained in the quick start section. You can
621692
find the files in [examples/add_sub](examples/add_sub).
622693

0 commit comments

Comments
 (0)