Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
30 commits
Select commit Hold shift + click to select a range
910ab98
Add CMAKE_BUILD_TYPE flag to CMakeLists.txt
krishung5 Jan 10, 2023
5cb4e2a
Add decoupled support for BLS
krishung5 Jan 10, 2023
ccdc065
Add execution timeout to the API
krishung5 Jan 11, 2023
fbe5e79
Update copyright
krishung5 Jan 11, 2023
c3a0397
Remove the wrong condition check for exec
krishung5 Jan 11, 2023
ee165b9
Add examples
krishung5 Jan 12, 2023
83a1bce
Use Release as default CMAKE_BUILD_TYPE
krishung5 Jan 12, 2023
94789bf
Rename variable
krishung5 Jan 12, 2023
59945a1
Update example models
krishung5 Jan 12, 2023
7ac20d1
Add documentation for BLS decoupled support
krishung5 Jan 12, 2023
108900b
Returns generator from stream_exec function
krishung5 Jan 31, 2023
32666af
Fix for completed response
krishung5 Jan 31, 2023
fc0fa30
Set futures in the constructor of InferResponse
krishung5 Feb 1, 2023
89bc459
Use the server API to set timeout
krishung5 Feb 1, 2023
bfff3c9
Format
krishung5 Feb 1, 2023
64a0843
Add 'decoupled' argument to exec() function. Remove stream_exec() and…
krishung5 Feb 1, 2023
2d9d10f
Address comments
krishung5 Feb 6, 2023
1d774c8
Rename 'execution_timeout' to 'timeout'
krishung5 Feb 6, 2023
473d19d
Remove unused variable and functions
krishung5 Feb 6, 2023
b4b600c
Make 'timeout' be part of the InferRequest constructor
krishung5 Feb 6, 2023
d31afb0
Move class 'ResponseGenerator' to a new file
krishung5 Feb 6, 2023
a399542
Fix up
krishung5 Feb 6, 2023
d0a0dcb
Update document for 'timeout' changes
krishung5 Feb 6, 2023
2796309
Remove the len() function for ResponseGenerator
krishung5 Feb 6, 2023
57e8fa5
Remove promise from InferRequest object
krishung5 Feb 7, 2023
1d04487
Wording
krishung5 Feb 7, 2023
7df2da3
Fix up
krishung5 Feb 7, 2023
9c18042
Address comment
krishung5 Feb 8, 2023
193661d
Fix up
krishung5 Feb 8, 2023
b6df7a8
Change the release version
krishung5 Feb 8, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 8 additions & 0 deletions CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -42,6 +42,10 @@ set(TRITON_BACKEND_REPO_TAG "main" CACHE STRING "Tag for triton-inference-server
set(TRITON_COMMON_REPO_TAG "main" CACHE STRING "Tag for triton-inference-server/common repo")
set(TRITON_CORE_REPO_TAG "main" CACHE STRING "Tag for triton-inference-server/core repo")

if(NOT CMAKE_BUILD_TYPE)
set(CMAKE_BUILD_TYPE Release)
endif()

#
# Dependencies
#
Expand Down Expand Up @@ -170,6 +174,8 @@ set(
src/request_executor.h
src/stub_launcher.h
src/stub_launcher.cc
src/infer_payload.h
src/infer_payload.cc
)

list(APPEND
Expand All @@ -190,6 +196,8 @@ set(
src/response_sender.h
src/pb_stub.h
src/pb_stub.cc
src/pb_generator.h
src/pb_generator.cc
)

list(APPEND
Expand Down
363 changes: 264 additions & 99 deletions README.md

Large diffs are not rendered by default.

163 changes: 163 additions & 0 deletions examples/bls_decoupled/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,163 @@
<!--
# Copyright 2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
#
# Redistribution and use in source and binary forms, with or without
# modification, are permitted provided that the following conditions
# are met:
# * Redistributions of source code must retain the above copyright
# notice, this list of conditions and the following disclaimer.
# * Redistributions in binary form must reproduce the above copyright
# notice, this list of conditions and the following disclaimer in the
# documentation and/or other materials provided with the distribution.
# * Neither the name of NVIDIA CORPORATION nor the names of its
# contributors may be used to endorse or promote products derived
# from this software without specific prior written permission.
#
# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS ``AS IS'' AND ANY
# EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
# IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
# PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR
# CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
# EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
# PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
# PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY
# OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
-->

# Example of using BLS with decoupled models

In this section we demonstrate an end-to-end example for
[BLS](../../README.md#business-logic-scripting) in Python backend. The
[model repository](https://github.com/triton-inference-server/server/blob/main/docs/user_guide/model_repository.md)
should contain [square](../decoupled) model. The [square](../decoupled) model
will send 'n' responses where 'n' is the value of input `IN`. For each response,
output `OUT` will equal the value of `IN`. This example is broken into two
sections. The first section demonstrates how to perform synchronous BLS requests
and the second section shows how to execute asynchronous BLS requests.

## Synchronous BLS Requests with Decoupled Models

The goal of `bls_decoupled_sync` model is to caculate the sum of the responses
returned from the [square](../decoupled) model and return the summation as the final response. The value of input 'IN' will be passed as an input to the
[square](../decoupled) model which determines how many responses the
[square](../decoupled) model will generate.

1. Create the model repository:

```console
mkdir -p models/bls_decoupled_sync/1
mkdir -p models/square_int32/1

# Copy the Python models
cp examples/bls_decoupled/sync_model.py models/bls_decoupled_sync/1/model.py
cp examples/bls_decoupled/sync_config.pbtxt models/bls_decoupled_sync/config.pbtxt
cp examples/decoupled/square_model.py models/square_int32/1/model.py
cp examples/decoupled/square_config.pbtxt models/square_int32/config.pbtxt
```

2. Start the tritonserver:

```
tritonserver --model-repository `pwd`/models
```

3. Send inference requests to server:

```
python3 examples/bls_decoupled/sync_client.py
```

You should see an output similar to the output below:

```
==========model result==========
The square value of [4] is [16]

==========model result==========
The square value of [2] is [4]

==========model result==========
The square value of [0] is [0]

==========model result==========
The square value of [1] is [1]

PASS: BLS Decoupled Sync
```

The [sync_model.py](./sync_model.py) model file is heavily commented with
explanations about each of the function calls.

### Explanation of the Client Output

The [client.py](./sync_client.py) sends 4 inference requests to the
`bls_decoupled_sync` model with the input as: [4], [2], [0] and [1]
respectively. In compliance with the behavior of the sync BLS model,
it will expect the output to be the square value of the input.

## Asynchronous BLS Requests with Decoupled Models

In this section we explain how to send multiple BLS requests without waiting for
their response. Asynchronous execution of BLS requests will not block your
model execution and can lead to speedups under certain conditions.

The `bls_decoupled_async` model will perform two async BLS requests on the
[square](../decoupled) model. Then, it will wait until the inference requests
are completed. It will caculate the sum of the output `OUT` from the
[square](../decoupled) model in both two requests to construct the final
inference response object using these tensors.

1. Create the model repository:

```console
mkdir -p models/bls_decoupled_async/1
mkdir -p models/square_int32/1

# Copy the Python models
cp examples/bls_decoupled/async_model.py models/bls_decoupled_async/1/model.py
cp examples/bls_decoupled/async_config.pbtxt models/bls_decoupled_async/config.pbtxt
cp examples/decoupled/square_model.py models/square_int32/1/model.py
cp examples/decoupled/square_config.pbtxt models/square_int32/config.pbtxt
```

2. Start the tritonserver:

```
tritonserver --model-repository `pwd`/models
```

3. Send inference requests to server:

```
python3 examples/bls_decoupled/async_client.py
```

You should see an output similar to the output below:

```
==========model result==========
Two times the square value of [4] is [32]

==========model result==========
Two times the square value of [2] is [8]

==========model result==========
Two times the square value of [0] is [0]

==========model result==========
Two times the square value of [1] is [2]

PASS: BLS Decoupled Async
```

The [async_model.py](./async_model.py) model file is heavily commented with
explanations about each of the function calls.

### Explanation of the Client Output

The [client.py](./async_client.py) sends 4 inference requests to the
'bls_decoupled_sync' model with the input as: [4], [2], [0] and [1]
respectively. In compliance with the behavior of sync BLS model model,
it will expect the output to be two time the square value of the input.
65 changes: 65 additions & 0 deletions examples/bls_decoupled/async_client.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,65 @@
# Copyright 2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
#
# Redistribution and use in source and binary forms, with or without
# modification, are permitted provided that the following conditions
# are met:
# * Redistributions of source code must retain the above copyright
# notice, this list of conditions and the following disclaimer.
# * Redistributions in binary form must reproduce the above copyright
# notice, this list of conditions and the following disclaimer in the
# documentation and/or other materials provided with the distribution.
# * Neither the name of NVIDIA CORPORATION nor the names of its
# contributors may be used to endorse or promote products derived
# from this software without specific prior written permission.
#
# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS ``AS IS'' AND ANY
# EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
# IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
# PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR
# CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
# EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
# PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
# PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY
# OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

from tritonclient.utils import *
import tritonclient.http as httpclient
import numpy as np
import sys

model_name = "bls_decoupled_async"
shape = [1]

with httpclient.InferenceServerClient("localhost:8000") as client:
in_values = [4, 2, 0, 1]

for in_value in in_values:
input_data = np.array([in_value], dtype=np.int32)
inputs = [
httpclient.InferInput("IN", input_data.shape,
np_to_triton_dtype(input_data.dtype))
]
inputs[0].set_data_from_numpy(input_data)
outputs = [httpclient.InferRequestedOutput("SUM")]

response = client.infer(model_name,
inputs,
request_id=str(1),
outputs=outputs)

result = response.get_response()
# output_data contains two times of the square value of the input value.
output_data = response.as_numpy("SUM")
print("==========model result==========")
print("Two times the square value of {} is {}\n".format(input_data, output_data))

if not np.allclose((2*input_data*input_data), output_data):
print(
"BLS Decoupled Async example error: incorrect output value. Expected {}, got {}."
.format((2*input_data*input_data), output_data))
sys.exit(1)

print('PASS: BLS Decoupled Async')
sys.exit(0)
45 changes: 45 additions & 0 deletions examples/bls_decoupled/async_config.pbtxt
Original file line number Diff line number Diff line change
@@ -0,0 +1,45 @@
# Copyright 2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
#
# Redistribution and use in source and binary forms, with or without
# modification, are permitted provided that the following conditions
# are met:
# * Redistributions of source code must retain the above copyright
# notice, this list of conditions and the following disclaimer.
# * Redistributions in binary form must reproduce the above copyright
# notice, this list of conditions and the following disclaimer in the
# documentation and/or other materials provided with the distribution.
# * Neither the name of NVIDIA CORPORATION nor the names of its
# contributors may be used to endorse or promote products derived
# from this software without specific prior written permission.
#
# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS ``AS IS'' AND ANY
# EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
# IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
# PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR
# CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
# EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
# PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
# PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY
# OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

name: "bls_decoupled_async"
backend: "python"

input [
{
name: "IN"
data_type: TYPE_INT32
dims: [ 1 ]
}
]
output [
{
name: "SUM"
data_type: TYPE_INT32
dims: [ 1 ]
}
]

instance_group [{ kind: KIND_CPU }]
Loading