Skip to content

Commit 8f845c0

Browse files
authored
Add documentation for BLS (triton-inference-server#70)
* Add documentation for BLS * Review edits
1 parent 4c01991 commit 8f845c0

File tree

10 files changed

+484
-20
lines changed

10 files changed

+484
-20
lines changed

README.md

Lines changed: 84 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
<!--
2-
# Copyright (c) 2021, NVIDIA CORPORATION. All rights reserved.
2+
# Copyright 2020-2021, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
33
#
44
# Redistribution and use in source and binary forms, with or without
55
# modification, are permitted provided that the following conditions
@@ -44,6 +44,7 @@ any C++ code.
4444
* [Error Handling](#error-handling)
4545
* [Managing Shared Memory](#managing-shared-memory)
4646
* [Building From Source](#building-from-source)
47+
* [Business Logic Scripting (beta)](#business-logic-scripting-beta)
4748

4849
## Quick Start
4950

@@ -471,6 +472,79 @@ properly set the `--shm-size` flag depending on the size of your inputs and
471472
outputs. The default value for docker run command is `64MB` which is very
472473
small.
473474

475+
# Business Logic Scripting (beta)
476+
477+
Triton's
478+
[ensemble](https://github.com/triton-inference-server/server/blob/main/docs/architecture.md#ensemble-models)
479+
feature supports many use cases where multiple models are composed into a
480+
pipeline (or more generally a DAG, directed acyclic graph). However, there are
481+
many other use cases that are not supported because as part of the model
482+
pipeline they require loops, conditionals (if-then-else), data-dependent
483+
control-flow and other custom logic to be intermixed with model execution. We
484+
call this combination of custom logic and model executions *Business Logic
485+
Scripting (BLS)*.
486+
487+
Starting from 21.08, you can implement BLS in your Python model. A new set of
488+
utility functions allows you to execute inference requests on other models being
489+
served by Triton as a part of executing your Python model. Example below shows
490+
how to use this feature:
491+
492+
```python
493+
import triton_python_backend_utils as pb_utils
494+
495+
496+
class TritonPythonModel:
497+
...
498+
def execute(self, requests):
499+
...
500+
# Create an InferenceRequest object. `model_name`,
501+
# `requested_output_names`, and `inputs` are the required arguments and
502+
# must be provided when constructing an InferenceRequest object. Make sure
503+
# to replace `inputs` argument with a list of `pb_utils.Tensor` objects.
504+
inference_request = pb_utils.InferenceRequest(
505+
model_name='model_name',
506+
requested_output_names=['REQUESTED_OUTPUT_1', 'REQUESTED_OUTPUT_2'],
507+
inputs=[<pb_utils.Tensor object>])
508+
509+
# `pb_utils.InferenceRequest` supports request_id, correlation_id, and model
510+
# version in addition to the arguments described above. These arguments
511+
# are optional. An example containing all the arguments:
512+
# inference_request = pb_utils.InferenceRequest(model_name='model_name',
513+
# requested_output_names=['REQUESTED_OUTPUT_1', 'REQUESTED_OUTPUT_2'],
514+
# inputs=[<list of pb_utils.Tensor objects>],
515+
# request_id="1", correlation_id=4, model_version=1)
516+
517+
# Execute the inference_request and wait for the response
518+
inference_response = inference_request.exec()
519+
520+
# Check if the inference response has an error
521+
if inference_response.has_error():
522+
raise pb_utils.TritonModelException(inference_response.error().message())
523+
else:
524+
# Extract the output tensors from the inference response.
525+
output1 = pb_utils.get_output_tensor_by_name(inference_response, 'REQUESTED_OUTPUT_1')
526+
output2 = pb_utils.get_output_tensor_by_name(inference_response, 'REQUESTED_OUTPUT_2')
527+
528+
# Decide the next steps for model execution based on the received output
529+
# tensors. It is possible to use the same output tensors to for the final
530+
# inference resposne too.
531+
```
532+
533+
A complete example for BLS in Python backend is included in the
534+
[Examples](#examples) section.
535+
536+
## Limitations
537+
538+
- The number of inference requests that can be executed as a part of your model
539+
execution is limited to the amount of shared memory available to the Triton
540+
server. If you are using Docker to start the TritonServer, you can control the
541+
shared memory usage using the
542+
[`--shm-size`](https://docs.docker.com/engine/reference/run/) flag.
543+
- You need to make sure that the inference requests performed as a part of your model
544+
do not create a circular dependency. For example, if model A performs an inference request
545+
on itself and there are no more model instances ready to execute the inference request, the
546+
model will block on the inference execution forever.
547+
474548
# Examples
475549

476550
For using the Triton Python client in these examples you need to install
@@ -486,12 +560,15 @@ find the files in [examples/add_sub](examples/add_sub).
486560
## AddSubNet in PyTorch
487561

488562
In order to use this model, you need to install PyTorch. We recommend using
489-
`pip` method mentioned in the [PyTorch
490-
website](https://pytorch.org/get-started/locally/). Make sure that PyTorch is
491-
available in the same Python environment as other dependencies. If you need
492-
to create another Python environment, please refer to the "Changing Python
493-
Runtime Path" section of this readme. You can find the files for this example
494-
in [examples/pytorch](examples/pytorch).
563+
`pip` method mentioned in the [PyTorch website](https://pytorch.org/get-started/locally/).
564+
Make sure that PyTorch is available in the same Python environment as other
565+
dependencies. Alternatively, you can create a [Python Execution Environment](#using-custom-python-execution-environments).
566+
You can find the files for this example in [examples/pytorch](examples/pytorch).
567+
568+
## Business Logic Scripting
569+
570+
The BLS example needs the dependencies required for both of the above examples.
571+
You can find the complete example instructions in [examples/bls](examples/bls/README.md).
495572

496573
# Reporting problems, asking questions
497574

examples/add_sub/client.py

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -25,7 +25,6 @@
2525
# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
2626

2727
from tritonclient.utils import *
28-
import tritonclient.grpc as grpcclient
2928
import tritonclient.http as httpclient
3029

3130
import numpy as np

examples/add_sub/config.pbtxt

Lines changed: 0 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -32,15 +32,13 @@ input [
3232
name: "INPUT0"
3333
data_type: TYPE_FP32
3434
dims: [ 4 ]
35-
3635
}
3736
]
3837
input [
3938
{
4039
name: "INPUT1"
4140
data_type: TYPE_FP32
4241
dims: [ 4 ]
43-
4442
}
4543
]
4644
output [

examples/add_sub/model.py

Lines changed: 0 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -24,8 +24,6 @@
2424
# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
2525
# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
2626

27-
import numpy as np
28-
import sys
2927
import json
3028

3129
# triton_python_backend_utils is available in every Triton Python model. You

examples/bls/README.md

Lines changed: 104 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,104 @@
1+
<!--
2+
# Copyright 2021, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
3+
#
4+
# Redistribution and use in source and binary forms, with or without
5+
# modification, are permitted provided that the following conditions
6+
# are met:
7+
# * Redistributions of source code must retain the above copyright
8+
# notice, this list of conditions and the following disclaimer.
9+
# * Redistributions in binary form must reproduce the above copyright
10+
# notice, this list of conditions and the following disclaimer in the
11+
# documentation and/or other materials provided with the distribution.
12+
# * Neither the name of NVIDIA CORPORATION nor the names of its
13+
# contributors may be used to endorse or promote products derived
14+
# from this software without specific prior written permission.
15+
#
16+
# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS ``AS IS'' AND ANY
17+
# EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
18+
# IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
19+
# PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR
20+
# CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
21+
# EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
22+
# PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
23+
# PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY
24+
# OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
25+
# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
26+
# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
27+
-->
28+
29+
# BLS Example
30+
31+
In this example we demonstrate an end-to-end example for
32+
[BLS](../../README.md#business-logic-scripting-beta) in Python backend. The
33+
[model repository](https://github.com/triton-inference-server/server/blob/main/docs/model_repository.md)
34+
should contain [PyTorch](../pytorch), [AddSub](../add_sub), and [BLS](../bls) models.
35+
The [PyTorch](../pytorch) and [AddSub](../add_sub) models
36+
calculate the sum and difference of the `INPUT0` and `INPUT1` and put the
37+
results in `OUTPUT0` and `OUTPUT1` respectively. The goal of the BLS model is
38+
the same as [PyTorch](../pytorch) and [AddSub](../add_sub) models but the
39+
difference is that the BLS model will not calculate the sum and difference by
40+
itself. The BLS model will pass the input tensors to the [PyTorch](../pytorch)
41+
or [AddSub](../add_sub) models and return the responses of that model as the
42+
final response. The additional parameter `MODEL_NAME` determines which model
43+
will be used for calculating the final outputs.
44+
45+
1. Create the model repository:
46+
47+
```console
48+
$ mkdir -p models/add_sub/1
49+
$ mkdir -p models/bls/1
50+
$ mkdir -p models/pytorch/1
51+
52+
# Copy the Python models
53+
$ cp examples/add_sub/model.py models/add_sub/1/
54+
$ cp examples/add_sub/config.pbtxt models/add_sub/
55+
$ cp examples/bls/model.py models/bls/1/
56+
$ cp examples/bls/config.pbtxt models/bls/
57+
$ cp examples/pytorch/model.py models/pytorch/1/
58+
$ cp examples/pytorch/config.pbtxt models/pytorch/
59+
```
60+
61+
2. Start the tritonserver:
62+
63+
```
64+
tritonserver --model-repository `pwd`/models
65+
```
66+
67+
3. Send inference requests to server:
68+
69+
```
70+
python3 examples/bls/client.py
71+
```
72+
73+
You should see an output similar to the output below:
74+
75+
```
76+
=========='add_sub' model result==========
77+
INPUT0 ([0.34984654 0.6808792 0.6509772 0.6211422 ]) + INPUT1 ([0.37917137 0.9080451 0.60789365 0.33425143]) = OUTPUT0 ([0.7290179 1.5889243 1.2588708 0.9553937])
78+
INPUT0 ([0.34984654 0.6808792 0.6509772 0.6211422 ]) - INPUT1 ([0.37917137 0.9080451 0.60789365 0.33425143]) = OUTPUT0 ([-0.02932483 -0.22716594 0.04308355 0.28689077])
79+
80+
81+
=========='pytorch' model result==========
82+
INPUT0 ([0.34984654 0.6808792 0.6509772 0.6211422 ]) + INPUT1 ([0.37917137 0.9080451 0.60789365 0.33425143]) = OUTPUT0 ([0.7290179 1.5889243 1.2588708 0.9553937])
83+
INPUT0 ([0.34984654 0.6808792 0.6509772 0.6211422 ]) - INPUT1 ([0.37917137 0.9080451 0.60789365 0.33425143]) = OUTPUT0 ([-0.02932483 -0.22716594 0.04308355 0.28689077])
84+
85+
86+
=========='undefined' model result==========
87+
Failed to process the request(s) for model instance 'bls_0', message: TritonModelException: Failed for execute the inference request. Model 'undefined_model' is not ready.
88+
89+
At:
90+
/tmp/python_backend/models/bls/1/model.py(110): execute
91+
```
92+
93+
The [bls](./model.py) model file is heavily commented with explanations about
94+
each of the function calls.
95+
96+
## Explanation of the Client Output
97+
98+
The [client.py](./client.py) sends three inference requests to the 'bls'
99+
model with different values for the "MODEL_NAME" input. As explained earlier,
100+
"MODEL_NAME" determines the model name that the "bls" model will use for
101+
calculating the final outputs. In the first request, it will use the "add_sub"
102+
model and in the seceond request it will use the "pytorch" model. The third
103+
request uses an incorrect model name to demonstrate error handling during
104+
the inference request execution.

examples/bls/client.py

Lines changed: 94 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,94 @@
1+
# Copyright 2021, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
2+
#
3+
# Redistribution and use in source and binary forms, with or without
4+
# modification, are permitted provided that the following conditions
5+
# are met:
6+
# * Redistributions of source code must retain the above copyright
7+
# notice, this list of conditions and the following disclaimer.
8+
# * Redistributions in binary form must reproduce the above copyright
9+
# notice, this list of conditions and the following disclaimer in the
10+
# documentation and/or other materials provided with the distribution.
11+
# * Neither the name of NVIDIA CORPORATION nor the names of its
12+
# contributors may be used to endorse or promote products derived
13+
# from this software without specific prior written permission.
14+
#
15+
# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS ``AS IS'' AND ANY
16+
# EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
17+
# IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
18+
# PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR
19+
# CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
20+
# EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
21+
# PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
22+
# PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY
23+
# OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
24+
# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
25+
# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
26+
27+
from tritonclient.utils import *
28+
import tritonclient.http as httpclient
29+
import numpy as np
30+
31+
model_name = "bls"
32+
shape = [4]
33+
34+
with httpclient.InferenceServerClient("localhost:8000") as client:
35+
input0_data = np.random.rand(*shape).astype(np.float32)
36+
input1_data = np.random.rand(*shape).astype(np.float32)
37+
inputs = [
38+
httpclient.InferInput("INPUT0", input0_data.shape,
39+
np_to_triton_dtype(input0_data.dtype)),
40+
httpclient.InferInput("INPUT1", input1_data.shape,
41+
np_to_triton_dtype(input1_data.dtype)),
42+
httpclient.InferInput("MODEL_NAME", [1],
43+
np_to_triton_dtype(np.object_)),
44+
]
45+
inputs[0].set_data_from_numpy(input0_data)
46+
inputs[1].set_data_from_numpy(input1_data)
47+
48+
# Will perform the inference request on the 'add_sub' model.
49+
inputs[2].set_data_from_numpy(np.array(['add_sub'], dtype=np.object_))
50+
51+
outputs = [
52+
httpclient.InferRequestedOutput("OUTPUT0"),
53+
httpclient.InferRequestedOutput("OUTPUT1"),
54+
]
55+
56+
response = client.infer(model_name,
57+
inputs,
58+
request_id=str(1),
59+
outputs=outputs)
60+
61+
result = response.get_response()
62+
print("=========='add_sub' model result==========")
63+
print("INPUT0 ({}) + INPUT1 ({}) = OUTPUT0 ({})".format(
64+
input0_data, input1_data, response.as_numpy("OUTPUT0")))
65+
print("INPUT0 ({}) - INPUT1 ({}) = OUTPUT1 ({})".format(
66+
input0_data, input1_data, response.as_numpy("OUTPUT1")))
67+
68+
# Will perform the inference request on the pytorch model:
69+
inputs[2].set_data_from_numpy(np.array(['pytorch'], dtype=np.object_))
70+
response = client.infer(model_name,
71+
inputs,
72+
request_id=str(1),
73+
outputs=outputs)
74+
75+
result = response.get_response()
76+
print("\n")
77+
print("=========='pytorch' model result==========")
78+
print("INPUT0 ({}) + INPUT1 ({}) = OUTPUT0 ({})".format(
79+
input0_data, input1_data, response.as_numpy("OUTPUT0")))
80+
print("INPUT0 ({}) - INPUT1 ({}) = OUTPUT1 ({})".format(
81+
input0_data, input1_data, response.as_numpy("OUTPUT1")))
82+
83+
# Will perform the same inference request on an undefined model. This leads
84+
# to an exception:
85+
print("\n")
86+
print("=========='undefined' model result==========")
87+
try:
88+
inputs[2].set_data_from_numpy(np.array(['undefined_model'], dtype=np.object_))
89+
response = client.infer(model_name,
90+
inputs,
91+
request_id=str(1),
92+
outputs=outputs)
93+
except InferenceServerException as e:
94+
print(e.message())

0 commit comments

Comments
 (0)