11<!--
2- # Copyright (c) 2021, NVIDIA CORPORATION. All rights reserved.
2+ # Copyright 2020- 2021, NVIDIA CORPORATION & AFFILIATES . All rights reserved.
33#
44# Redistribution and use in source and binary forms, with or without
55# modification, are permitted provided that the following conditions
@@ -44,6 +44,7 @@ any C++ code.
4444* [ Error Handling] ( #error-handling )
4545* [ Managing Shared Memory] ( #managing-shared-memory )
4646* [ Building From Source] ( #building-from-source )
47+ * [ Business Logic Scripting (beta)] ( #business-logic-scripting-beta )
4748
4849## Quick Start
4950
@@ -471,6 +472,79 @@ properly set the `--shm-size` flag depending on the size of your inputs and
471472outputs. The default value for docker run command is ` 64MB ` which is very
472473small.
473474
475+ # Business Logic Scripting (beta)
476+
477+ Triton's
478+ [ ensemble] ( https://github.com/triton-inference-server/server/blob/main/docs/architecture.md#ensemble-models )
479+ feature supports many use cases where multiple models are composed into a
480+ pipeline (or more generally a DAG, directed acyclic graph). However, there are
481+ many other use cases that are not supported because as part of the model
482+ pipeline they require loops, conditionals (if-then-else), data-dependent
483+ control-flow and other custom logic to be intermixed with model execution. We
484+ call this combination of custom logic and model executions * Business Logic
485+ Scripting (BLS)* .
486+
487+ Starting from 21.08, you can implement BLS in your Python model. A new set of
488+ utility functions allows you to execute inference requests on other models being
489+ served by Triton as a part of executing your Python model. Example below shows
490+ how to use this feature:
491+
492+ ``` python
493+ import triton_python_backend_utils as pb_utils
494+
495+
496+ class TritonPythonModel :
497+ ...
498+ def execute (self , requests ):
499+ ...
500+ # Create an InferenceRequest object. `model_name`,
501+ # `requested_output_names`, and `inputs` are the required arguments and
502+ # must be provided when constructing an InferenceRequest object. Make sure
503+ # to replace `inputs` argument with a list of `pb_utils.Tensor` objects.
504+ inference_request = pb_utils.InferenceRequest(
505+ model_name = ' model_name' ,
506+ requested_output_names = [' REQUESTED_OUTPUT_1' , ' REQUESTED_OUTPUT_2' ],
507+ inputs = [< pb_utils.Tensor object > ])
508+
509+ # `pb_utils.InferenceRequest` supports request_id, correlation_id, and model
510+ # version in addition to the arguments described above. These arguments
511+ # are optional. An example containing all the arguments:
512+ # inference_request = pb_utils.InferenceRequest(model_name='model_name',
513+ # requested_output_names=['REQUESTED_OUTPUT_1', 'REQUESTED_OUTPUT_2'],
514+ # inputs=[<list of pb_utils.Tensor objects>],
515+ # request_id="1", correlation_id=4, model_version=1)
516+
517+ # Execute the inference_request and wait for the response
518+ inference_response = inference_request.exec()
519+
520+ # Check if the inference response has an error
521+ if inference_response.has_error():
522+ raise pb_utils.TritonModelException(inference_response.error().message())
523+ else :
524+ # Extract the output tensors from the inference response.
525+ output1 = pb_utils.get_output_tensor_by_name(inference_response, ' REQUESTED_OUTPUT_1' )
526+ output2 = pb_utils.get_output_tensor_by_name(inference_response, ' REQUESTED_OUTPUT_2' )
527+
528+ # Decide the next steps for model execution based on the received output
529+ # tensors. It is possible to use the same output tensors to for the final
530+ # inference resposne too.
531+ ```
532+
533+ A complete example for BLS in Python backend is included in the
534+ [ Examples] ( #examples ) section.
535+
536+ ## Limitations
537+
538+ - The number of inference requests that can be executed as a part of your model
539+ execution is limited to the amount of shared memory available to the Triton
540+ server. If you are using Docker to start the TritonServer, you can control the
541+ shared memory usage using the
542+ [ ` --shm-size ` ] ( https://docs.docker.com/engine/reference/run/ ) flag.
543+ - You need to make sure that the inference requests performed as a part of your model
544+ do not create a circular dependency. For example, if model A performs an inference request
545+ on itself and there are no more model instances ready to execute the inference request, the
546+ model will block on the inference execution forever.
547+
474548# Examples
475549
476550For using the Triton Python client in these examples you need to install
@@ -486,12 +560,15 @@ find the files in [examples/add_sub](examples/add_sub).
486560## AddSubNet in PyTorch
487561
488562In order to use this model, you need to install PyTorch. We recommend using
489- ` pip ` method mentioned in the [ PyTorch
490- website] ( https://pytorch.org/get-started/locally/ ) . Make sure that PyTorch is
491- available in the same Python environment as other dependencies. If you need
492- to create another Python environment, please refer to the "Changing Python
493- Runtime Path" section of this readme. You can find the files for this example
494- in [ examples/pytorch] ( examples/pytorch ) .
563+ ` pip ` method mentioned in the [ PyTorch website] ( https://pytorch.org/get-started/locally/ ) .
564+ Make sure that PyTorch is available in the same Python environment as other
565+ dependencies. Alternatively, you can create a [ Python Execution Environment] ( #using-custom-python-execution-environments ) .
566+ You can find the files for this example in [ examples/pytorch] ( examples/pytorch ) .
567+
568+ ## Business Logic Scripting
569+
570+ The BLS example needs the dependencies required for both of the above examples.
571+ You can find the complete example instructions in [ examples/bls] ( examples/bls/README.md ) .
495572
496573# Reporting problems, asking questions
497574
0 commit comments