Skip to content

Commit 841316b

Browse files
authored
Add async BLS documentation (triton-inference-server#78)
* Add async BLS documentation * Review edits
1 parent 71f2828 commit 841316b

File tree

8 files changed

+411
-30
lines changed

8 files changed

+411
-30
lines changed

README.md

Lines changed: 58 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -363,8 +363,9 @@ above. However, it is important to see `libpython3.6m.so.1.0` in the list of
363363
linked shared libraries. If you use a different Python version, you should see
364364
that version instead. You need to copy the `triton_python_backend_stub` to the
365365
model directory of the models that want to use the custom Python backend
366-
stub. For example, if you have `model_a` in your [model repository](https://github.com/triton-inference-server/server/blob/main/docs/model_repository.md), the folder
367-
structure should look like below:
366+
stub. For example, if you have `model_a` in your
367+
[model repository](https://github.com/triton-inference-server/server/blob/main/docs/model_repository.md),
368+
the folder structure should look like below:
368369

369370
```
370371
models
@@ -537,10 +538,62 @@ class TritonPythonModel:
537538

538539
# Decide the next steps for model execution based on the received output
539540
# tensors. It is possible to use the same output tensors to for the final
540-
# inference resposne too.
541+
# inference response too.
541542
```
542543

543-
A complete example for BLS in Python backend is included in the
544+
545+
In addition to the `inference_request.exec` function that allows you to
546+
execute blocking inference requests, `inference_request.async_exec` allows
547+
you to perform async inference requests. This can be useful when you do not
548+
need the result of the inference immediately. Using `async_exec` function, it
549+
is possible to have multiple inflight inference requests and wait for the
550+
responses only when needed. Example below shows how to use `async_exec`:
551+
552+
```python
553+
import triton_python_backend_utils as pb_utils
554+
import asyncio
555+
556+
557+
class TritonPythonModel:
558+
...
559+
560+
# You must add the Python 'async' keyword to the beginning of `execute`
561+
# function if you want to use `async_exec` function.
562+
async def execute(self, requests):
563+
...
564+
# Create an InferenceRequest object. `model_name`,
565+
# `requested_output_names`, and `inputs` are the required arguments and
566+
# must be provided when constructing an InferenceRequest object. Make sure
567+
# to replace `inputs` argument with a list of `pb_utils.Tensor` objects.
568+
inference_request = pb_utils.InferenceRequest(
569+
model_name='model_name',
570+
requested_output_names=['REQUESTED_OUTPUT_1', 'REQUESTED_OUTPUT_2'],
571+
inputs=[<pb_utils.Tensor object>])
572+
573+
infer_response_awaits = []
574+
for i in range(4):
575+
# async_exec function returns an
576+
# [Awaitable](https://docs.python.org/3/library/asyncio-task.html#awaitables)
577+
# object.
578+
inference_response_awaits.append(inference_request.async_exec())
579+
580+
# Wait for all of the inference requests to complete.
581+
infer_responses = await asyncio.gather(*infer_response_awaits)
582+
583+
for infer_response in infer_responses:
584+
# Check if the inference response has an error
585+
if inference_response.has_error():
586+
raise pb_utils.TritonModelException(inference_response.error().message())
587+
else:
588+
# Extract the output tensors from the inference response.
589+
output1 = pb_utils.get_output_tensor_by_name(inference_response, 'REQUESTED_OUTPUT_1')
590+
output2 = pb_utils.get_output_tensor_by_name(inference_response, 'REQUESTED_OUTPUT_2')
591+
592+
# Decide the next steps for model execution based on the received output
593+
# tensors.
594+
```
595+
596+
A complete example for sync and async BLS in Python backend is included in the
544597
[Examples](#examples) section.
545598

546599
## Limitations
@@ -561,7 +614,7 @@ For using the Triton Python client in these examples you need to install
561614
the [Triton Python Client Library](https://github.com/triton-inference-server/client#getting-the-client-libraries-and-examples).
562615
The Python client for each of the examples is in the `client.py` file.
563616

564-
## AddSub in Numpy
617+
## AddSub in NumPy
565618

566619
There is no dependencies required for the AddSub numpy example. Instructions
567620
on how to use this model is explained in the quick start section. You can

examples/bls/README.md

Lines changed: 74 additions & 21 deletions
Original file line numberDiff line numberDiff line change
@@ -28,32 +28,38 @@
2828

2929
# BLS Example
3030

31-
In this example we demonstrate an end-to-end example for
31+
In this section we demonstrate an end-to-end example for
3232
[BLS](../../README.md#business-logic-scripting-beta) in Python backend. The
3333
[model repository](https://github.com/triton-inference-server/server/blob/main/docs/model_repository.md)
34-
should contain [PyTorch](../pytorch), [AddSub](../add_sub), and [BLS](../bls) models.
35-
The [PyTorch](../pytorch) and [AddSub](../add_sub) models
36-
calculate the sum and difference of the `INPUT0` and `INPUT1` and put the
37-
results in `OUTPUT0` and `OUTPUT1` respectively. The goal of the BLS model is
38-
the same as [PyTorch](../pytorch) and [AddSub](../add_sub) models but the
39-
difference is that the BLS model will not calculate the sum and difference by
40-
itself. The BLS model will pass the input tensors to the [PyTorch](../pytorch)
41-
or [AddSub](../add_sub) models and return the responses of that model as the
42-
final response. The additional parameter `MODEL_NAME` determines which model
43-
will be used for calculating the final outputs.
34+
should contain [pytorch](../pytorch), [addsub](../add_sub). The
35+
[pytorch](../pytorch) and [addsub](../add_sub) models calculate the sum and
36+
difference of the `INPUT0` and `INPUT1` and put the results in `OUTPUT0` and
37+
`OUTPUT1` respectively. This example is broken into two sections. The first
38+
section demonstrates how to perform synchronous BLS requests and the second
39+
section shows how to execute asynchronous BLS requests.
40+
41+
## Synchronous BLS Requests
42+
43+
The goal of sync BLS model is the same as [pytorch](../pytorch) and
44+
[addsub](../add_sub) models but the difference is that the BLS model will not
45+
calculate the sum and difference by itself. The sync BLS model will pass the
46+
input tensors to the [pytorch](../pytorch) or [addsub](../add_sub) models and
47+
return the responses of that model as the final response. The additional
48+
parameter `MODEL_NAME` determines which model will be used for calculating the
49+
final outputs.
4450

4551
1. Create the model repository:
4652

4753
```console
4854
$ mkdir -p models/add_sub/1
49-
$ mkdir -p models/bls/1
55+
$ mkdir -p models/bls_sync/1
5056
$ mkdir -p models/pytorch/1
5157

5258
# Copy the Python models
5359
$ cp examples/add_sub/model.py models/add_sub/1/
54-
$ cp examples/add_sub/config.pbtxt models/add_sub/
55-
$ cp examples/bls/model.py models/bls/1/
56-
$ cp examples/bls/config.pbtxt models/bls/
60+
$ cp examples/add_sub/config.pbtxt models/add_sub/config.pbtxt
61+
$ cp examples/bls/sync_model.py models/bls_sync/1/model.py
62+
$ cp examples/bls/sync_config.pbtxt models/bls_sync/config.pbtxt
5763
$ cp examples/pytorch/model.py models/pytorch/1/
5864
$ cp examples/pytorch/config.pbtxt models/pytorch/
5965
```
@@ -67,7 +73,7 @@ tritonserver --model-repository `pwd`/models
6773
3. Send inference requests to server:
6874

6975
```
70-
python3 examples/bls/client.py
76+
python3 examples/bls/sync_client.py
7177
```
7278

7379
You should see an output similar to the output below:
@@ -90,15 +96,62 @@ At:
9096
/tmp/python_backend/models/bls/1/model.py(110): execute
9197
```
9298

93-
The [bls](./model.py) model file is heavily commented with explanations about
94-
each of the function calls.
99+
The [sync_model.py](./sync_model.py) model file is heavily commented with
100+
explanations about each of the function calls.
95101

96-
## Explanation of the Client Output
102+
### Explanation of the Client Output
97103

98-
The [client.py](./client.py) sends three inference requests to the 'bls'
104+
The [client.py](./sync_client.py) sends three inference requests to the 'bls_sync'
99105
model with different values for the "MODEL_NAME" input. As explained earlier,
100106
"MODEL_NAME" determines the model name that the "bls" model will use for
101107
calculating the final outputs. In the first request, it will use the "add_sub"
102-
model and in the seceond request it will use the "pytorch" model. The third
108+
model and in the second request it will use the "pytorch" model. The third
103109
request uses an incorrect model name to demonstrate error handling during
104110
the inference request execution.
111+
112+
## Asynchronous BLS Requests
113+
114+
In this section we explain how to send multiple BLS requests without waiting for
115+
their response. Asynchronous execution of BLS requests will not block your
116+
model execution and can lead to speedups under certain conditions.
117+
118+
The `bls_async` model will perform two async BLS requests on the
119+
[pytorch](../pytorch) and [addsub](../add_sub) models. Then, it will wait until
120+
the inference requests on these models is completed. It will extract `OUTPUT0`
121+
from the [pytorch](../pytorch) and `OUTPUT1` from the [addsub](../add_sub) model
122+
to construct the final inference response object using these tensors.
123+
124+
1. Create the model repository:
125+
126+
```console
127+
$ mkdir -p models/add_sub/1
128+
$ mkdir -p models/bls_async/1
129+
$ mkdir -p models/pytorch/1
130+
131+
# Copy the Python models
132+
$ cp examples/add_sub/model.py models/add_sub/1/
133+
$ cp examples/add_sub/config.pbtxt models/add_sub/
134+
$ cp examples/bls/async_model.py models/bls_async/1/model.py
135+
$ cp examples/bls/async_config.pbtxt models/bls_async/config.pbtxt
136+
$ cp examples/pytorch/model.py models/pytorch/1/
137+
$ cp examples/pytorch/config.pbtxt models/pytorch/
138+
```
139+
140+
2. Start the tritonserver:
141+
142+
```
143+
tritonserver --model-repository `pwd`/models
144+
```
145+
146+
3. Send inference requests to server:
147+
148+
```
149+
python3 examples/bls/async_client.py
150+
```
151+
152+
You should see an output similar to the output below:
153+
154+
```
155+
INPUT0 ([0.72394824 0.45873794 0.4307444 0.07681174]) + INPUT1 ([0.34224355 0.8271524 0.5831284 0.904624 ]) = OUTPUT0 ([1.0661918 1.2858903 1.0138729 0.9814357])
156+
INPUT0 ([0.72394824 0.45873794 0.4307444 0.07681174]) - INPUT1 ([0.34224355 0.8271524 0.5831284 0.904624 ]) = OUTPUT1 ([ 0.3817047 -0.36841443 -0.15238398 -0.82781225])
157+
```

examples/bls/async_client.py

Lines changed: 60 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,60 @@
1+
# Copyright 2021, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
2+
#
3+
# Redistribution and use in source and binary forms, with or without
4+
# modification, are permitted provided that the following conditions
5+
# are met:
6+
# * Redistributions of source code must retain the above copyright
7+
# notice, this list of conditions and the following disclaimer.
8+
# * Redistributions in binary form must reproduce the above copyright
9+
# notice, this list of conditions and the following disclaimer in the
10+
# documentation and/or other materials provided with the distribution.
11+
# * Neither the name of NVIDIA CORPORATION nor the names of its
12+
# contributors may be used to endorse or promote products derived
13+
# from this software without specific prior written permission.
14+
#
15+
# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS ``AS IS'' AND ANY
16+
# EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
17+
# IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
18+
# PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR
19+
# CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
20+
# EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
21+
# PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
22+
# PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY
23+
# OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
24+
# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
25+
# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
26+
27+
from tritonclient.utils import *
28+
import tritonclient.http as httpclient
29+
import numpy as np
30+
31+
model_name = "bls_async"
32+
shape = [4]
33+
34+
with httpclient.InferenceServerClient("localhost:8000") as client:
35+
input0_data = np.random.rand(*shape).astype(np.float32)
36+
input1_data = np.random.rand(*shape).astype(np.float32)
37+
inputs = [
38+
httpclient.InferInput("INPUT0", input0_data.shape,
39+
np_to_triton_dtype(input0_data.dtype)),
40+
httpclient.InferInput("INPUT1", input1_data.shape,
41+
np_to_triton_dtype(input1_data.dtype)),
42+
]
43+
inputs[0].set_data_from_numpy(input0_data)
44+
inputs[1].set_data_from_numpy(input1_data)
45+
46+
outputs = [
47+
httpclient.InferRequestedOutput("OUTPUT0"),
48+
httpclient.InferRequestedOutput("OUTPUT1"),
49+
]
50+
51+
response = client.infer(model_name,
52+
inputs,
53+
request_id=str(1),
54+
outputs=outputs)
55+
56+
result = response.get_response()
57+
print("INPUT0 ({}) + INPUT1 ({}) = OUTPUT0 ({})".format(
58+
input0_data, input1_data, response.as_numpy("OUTPUT0")))
59+
print("INPUT0 ({}) - INPUT1 ({}) = OUTPUT1 ({})".format(
60+
input0_data, input1_data, response.as_numpy("OUTPUT1")))

examples/bls/async_config.pbtxt

Lines changed: 59 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,59 @@
1+
# Copyright 2021, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
2+
#
3+
# Redistribution and use in source and binary forms, with or without
4+
# modification, are permitted provided that the following conditions
5+
# are met:
6+
# * Redistributions of source code must retain the above copyright
7+
# notice, this list of conditions and the following disclaimer.
8+
# * Redistributions in binary form must reproduce the above copyright
9+
# notice, this list of conditions and the following disclaimer in the
10+
# documentation and/or other materials provided with the distribution.
11+
# * Neither the name of NVIDIA CORPORATION nor the names of its
12+
# contributors may be used to endorse or promote products derived
13+
# from this software without specific prior written permission.
14+
#
15+
# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS ``AS IS'' AND ANY
16+
# EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
17+
# IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
18+
# PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR
19+
# CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
20+
# EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
21+
# PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
22+
# PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY
23+
# OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
24+
# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
25+
# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
26+
27+
name: "bls_async"
28+
backend: "python"
29+
30+
input [
31+
{
32+
name: "INPUT0"
33+
data_type: TYPE_FP32
34+
dims: [ 4 ]
35+
}
36+
]
37+
input [
38+
{
39+
name: "INPUT1"
40+
data_type: TYPE_FP32
41+
dims: [ 4 ]
42+
}
43+
]
44+
output [
45+
{
46+
name: "OUTPUT0"
47+
data_type: TYPE_FP32
48+
dims: [ 4 ]
49+
}
50+
]
51+
output [
52+
{
53+
name: "OUTPUT1"
54+
data_type: TYPE_FP32
55+
dims: [ 4 ]
56+
}
57+
]
58+
59+
instance_group [{ kind: KIND_CPU }]

0 commit comments

Comments
 (0)