Skip to content

Commit 0aef2c4

Browse files
authored
Add decoupled support for BLS (triton-inference-server#203)
* Add CMAKE_BUILD_TYPE flag to CMakeLists.txt * Add decoupled support for BLS * Add execution timeout to the API * Update copyright * Remove the wrong condition check for exec * Add examples * Use Release as default CMAKE_BUILD_TYPE * Rename variable * Update example models * Add documentation for BLS decoupled support * Returns generator from stream_exec function * Fix for completed response * Set futures in the constructor of InferResponse * Use the server API to set timeout * Format * Add 'decoupled' argument to exec() function. Remove stream_exec() and async_stream_exec() * Address comments * Rename 'execution_timeout' to 'timeout' * Remove unused variable and functions * Make 'timeout' be part of the InferRequest constructor * Move class 'ResponseGenerator' to a new file * Fix up * Update document for 'timeout' changes * Remove the len() function for ResponseGenerator * Remove promise from InferRequest object * Wording * Fix up * Address comment * Fix up * Change the release version
1 parent 3297448 commit 0aef2c4

24 files changed

+1596
-274
lines changed

CMakeLists.txt

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -42,6 +42,10 @@ set(TRITON_BACKEND_REPO_TAG "main" CACHE STRING "Tag for triton-inference-server
4242
set(TRITON_COMMON_REPO_TAG "main" CACHE STRING "Tag for triton-inference-server/common repo")
4343
set(TRITON_CORE_REPO_TAG "main" CACHE STRING "Tag for triton-inference-server/core repo")
4444

45+
if(NOT CMAKE_BUILD_TYPE)
46+
set(CMAKE_BUILD_TYPE Release)
47+
endif()
48+
4549
#
4650
# Dependencies
4751
#
@@ -170,6 +174,8 @@ set(
170174
src/request_executor.h
171175
src/stub_launcher.h
172176
src/stub_launcher.cc
177+
src/infer_payload.h
178+
src/infer_payload.cc
173179
)
174180

175181
list(APPEND
@@ -190,6 +196,8 @@ set(
190196
src/response_sender.h
191197
src/pb_stub.h
192198
src/pb_stub.cc
199+
src/pb_generator.h
200+
src/pb_generator.cc
193201
)
194202

195203
list(APPEND

README.md

Lines changed: 264 additions & 99 deletions
Large diffs are not rendered by default.

examples/bls_decoupled/README.md

Lines changed: 163 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,163 @@
1+
<!--
2+
# Copyright 2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
3+
#
4+
# Redistribution and use in source and binary forms, with or without
5+
# modification, are permitted provided that the following conditions
6+
# are met:
7+
# * Redistributions of source code must retain the above copyright
8+
# notice, this list of conditions and the following disclaimer.
9+
# * Redistributions in binary form must reproduce the above copyright
10+
# notice, this list of conditions and the following disclaimer in the
11+
# documentation and/or other materials provided with the distribution.
12+
# * Neither the name of NVIDIA CORPORATION nor the names of its
13+
# contributors may be used to endorse or promote products derived
14+
# from this software without specific prior written permission.
15+
#
16+
# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS ``AS IS'' AND ANY
17+
# EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
18+
# IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
19+
# PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR
20+
# CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
21+
# EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
22+
# PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
23+
# PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY
24+
# OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
25+
# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
26+
# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
27+
-->
28+
29+
# Example of using BLS with decoupled models
30+
31+
In this section we demonstrate an end-to-end example for
32+
[BLS](../../README.md#business-logic-scripting) in Python backend. The
33+
[model repository](https://github.com/triton-inference-server/server/blob/main/docs/user_guide/model_repository.md)
34+
should contain [square](../decoupled) model. The [square](../decoupled) model
35+
will send 'n' responses where 'n' is the value of input `IN`. For each response,
36+
output `OUT` will equal the value of `IN`. This example is broken into two
37+
sections. The first section demonstrates how to perform synchronous BLS requests
38+
and the second section shows how to execute asynchronous BLS requests.
39+
40+
## Synchronous BLS Requests with Decoupled Models
41+
42+
The goal of `bls_decoupled_sync` model is to caculate the sum of the responses
43+
returned from the [square](../decoupled) model and return the summation as the final response. The value of input 'IN' will be passed as an input to the
44+
[square](../decoupled) model which determines how many responses the
45+
[square](../decoupled) model will generate.
46+
47+
1. Create the model repository:
48+
49+
```console
50+
mkdir -p models/bls_decoupled_sync/1
51+
mkdir -p models/square_int32/1
52+
53+
# Copy the Python models
54+
cp examples/bls_decoupled/sync_model.py models/bls_decoupled_sync/1/model.py
55+
cp examples/bls_decoupled/sync_config.pbtxt models/bls_decoupled_sync/config.pbtxt
56+
cp examples/decoupled/square_model.py models/square_int32/1/model.py
57+
cp examples/decoupled/square_config.pbtxt models/square_int32/config.pbtxt
58+
```
59+
60+
2. Start the tritonserver:
61+
62+
```
63+
tritonserver --model-repository `pwd`/models
64+
```
65+
66+
3. Send inference requests to server:
67+
68+
```
69+
python3 examples/bls_decoupled/sync_client.py
70+
```
71+
72+
You should see an output similar to the output below:
73+
74+
```
75+
==========model result==========
76+
The square value of [4] is [16]
77+
78+
==========model result==========
79+
The square value of [2] is [4]
80+
81+
==========model result==========
82+
The square value of [0] is [0]
83+
84+
==========model result==========
85+
The square value of [1] is [1]
86+
87+
PASS: BLS Decoupled Sync
88+
```
89+
90+
The [sync_model.py](./sync_model.py) model file is heavily commented with
91+
explanations about each of the function calls.
92+
93+
### Explanation of the Client Output
94+
95+
The [client.py](./sync_client.py) sends 4 inference requests to the
96+
`bls_decoupled_sync` model with the input as: [4], [2], [0] and [1]
97+
respectively. In compliance with the behavior of the sync BLS model,
98+
it will expect the output to be the square value of the input.
99+
100+
## Asynchronous BLS Requests with Decoupled Models
101+
102+
In this section we explain how to send multiple BLS requests without waiting for
103+
their response. Asynchronous execution of BLS requests will not block your
104+
model execution and can lead to speedups under certain conditions.
105+
106+
The `bls_decoupled_async` model will perform two async BLS requests on the
107+
[square](../decoupled) model. Then, it will wait until the inference requests
108+
are completed. It will caculate the sum of the output `OUT` from the
109+
[square](../decoupled) model in both two requests to construct the final
110+
inference response object using these tensors.
111+
112+
1. Create the model repository:
113+
114+
```console
115+
mkdir -p models/bls_decoupled_async/1
116+
mkdir -p models/square_int32/1
117+
118+
# Copy the Python models
119+
cp examples/bls_decoupled/async_model.py models/bls_decoupled_async/1/model.py
120+
cp examples/bls_decoupled/async_config.pbtxt models/bls_decoupled_async/config.pbtxt
121+
cp examples/decoupled/square_model.py models/square_int32/1/model.py
122+
cp examples/decoupled/square_config.pbtxt models/square_int32/config.pbtxt
123+
```
124+
125+
2. Start the tritonserver:
126+
127+
```
128+
tritonserver --model-repository `pwd`/models
129+
```
130+
131+
3. Send inference requests to server:
132+
133+
```
134+
python3 examples/bls_decoupled/async_client.py
135+
```
136+
137+
You should see an output similar to the output below:
138+
139+
```
140+
==========model result==========
141+
Two times the square value of [4] is [32]
142+
143+
==========model result==========
144+
Two times the square value of [2] is [8]
145+
146+
==========model result==========
147+
Two times the square value of [0] is [0]
148+
149+
==========model result==========
150+
Two times the square value of [1] is [2]
151+
152+
PASS: BLS Decoupled Async
153+
```
154+
155+
The [async_model.py](./async_model.py) model file is heavily commented with
156+
explanations about each of the function calls.
157+
158+
### Explanation of the Client Output
159+
160+
The [client.py](./async_client.py) sends 4 inference requests to the
161+
'bls_decoupled_sync' model with the input as: [4], [2], [0] and [1]
162+
respectively. In compliance with the behavior of sync BLS model model,
163+
it will expect the output to be two time the square value of the input.
Lines changed: 65 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,65 @@
1+
# Copyright 2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
2+
#
3+
# Redistribution and use in source and binary forms, with or without
4+
# modification, are permitted provided that the following conditions
5+
# are met:
6+
# * Redistributions of source code must retain the above copyright
7+
# notice, this list of conditions and the following disclaimer.
8+
# * Redistributions in binary form must reproduce the above copyright
9+
# notice, this list of conditions and the following disclaimer in the
10+
# documentation and/or other materials provided with the distribution.
11+
# * Neither the name of NVIDIA CORPORATION nor the names of its
12+
# contributors may be used to endorse or promote products derived
13+
# from this software without specific prior written permission.
14+
#
15+
# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS ``AS IS'' AND ANY
16+
# EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
17+
# IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
18+
# PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR
19+
# CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
20+
# EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
21+
# PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
22+
# PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY
23+
# OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
24+
# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
25+
# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
26+
27+
from tritonclient.utils import *
28+
import tritonclient.http as httpclient
29+
import numpy as np
30+
import sys
31+
32+
model_name = "bls_decoupled_async"
33+
shape = [1]
34+
35+
with httpclient.InferenceServerClient("localhost:8000") as client:
36+
in_values = [4, 2, 0, 1]
37+
38+
for in_value in in_values:
39+
input_data = np.array([in_value], dtype=np.int32)
40+
inputs = [
41+
httpclient.InferInput("IN", input_data.shape,
42+
np_to_triton_dtype(input_data.dtype))
43+
]
44+
inputs[0].set_data_from_numpy(input_data)
45+
outputs = [httpclient.InferRequestedOutput("SUM")]
46+
47+
response = client.infer(model_name,
48+
inputs,
49+
request_id=str(1),
50+
outputs=outputs)
51+
52+
result = response.get_response()
53+
# output_data contains two times of the square value of the input value.
54+
output_data = response.as_numpy("SUM")
55+
print("==========model result==========")
56+
print("Two times the square value of {} is {}\n".format(input_data, output_data))
57+
58+
if not np.allclose((2*input_data*input_data), output_data):
59+
print(
60+
"BLS Decoupled Async example error: incorrect output value. Expected {}, got {}."
61+
.format((2*input_data*input_data), output_data))
62+
sys.exit(1)
63+
64+
print('PASS: BLS Decoupled Async')
65+
sys.exit(0)
Lines changed: 45 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,45 @@
1+
# Copyright 2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
2+
#
3+
# Redistribution and use in source and binary forms, with or without
4+
# modification, are permitted provided that the following conditions
5+
# are met:
6+
# * Redistributions of source code must retain the above copyright
7+
# notice, this list of conditions and the following disclaimer.
8+
# * Redistributions in binary form must reproduce the above copyright
9+
# notice, this list of conditions and the following disclaimer in the
10+
# documentation and/or other materials provided with the distribution.
11+
# * Neither the name of NVIDIA CORPORATION nor the names of its
12+
# contributors may be used to endorse or promote products derived
13+
# from this software without specific prior written permission.
14+
#
15+
# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS ``AS IS'' AND ANY
16+
# EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
17+
# IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
18+
# PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR
19+
# CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
20+
# EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
21+
# PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
22+
# PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY
23+
# OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
24+
# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
25+
# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
26+
27+
name: "bls_decoupled_async"
28+
backend: "python"
29+
30+
input [
31+
{
32+
name: "IN"
33+
data_type: TYPE_INT32
34+
dims: [ 1 ]
35+
}
36+
]
37+
output [
38+
{
39+
name: "SUM"
40+
data_type: TYPE_INT32
41+
dims: [ 1 ]
42+
}
43+
]
44+
45+
instance_group [{ kind: KIND_CPU }]

0 commit comments

Comments
 (0)