@@ -67,7 +67,7 @@ $ make install
6767``` 
6868
6969The following required Triton repositories will be pulled and used in
70- the build. By default the "main" branch/tag will be used for each repo
70+ the build. By default,  the "main" branch/tag will be used for each repo
7171but the listed CMake argument can be used to override.
7272
7373*  triton-inference-server/backend: -DTRITON_BACKEND_REPO_TAG=[ tag] 
@@ -100,10 +100,10 @@ $ make install
100100### Parameters  
101101
102102Triton exposes some flags to control the execution mode of the TorchScript models through
103- the Parameters section of the model's ' config.pbtxt'  file.
103+ the Parameters section of the model's ` config.pbtxt `  file.
104104
105105*  ` DISABLE_OPTIMIZED_EXECUTION ` : Boolean flag to disable the optimized execution
106- of TorchScript models. By default the optimized execuiton  is always enabled.
106+ of TorchScript models. By default,  the optimized execution  is always enabled.
107107
108108The initial calls to a loaded TorchScript model take extremely long. Due to this longer
109109model warmup [ issue] ( https://github.com/pytorch/pytorch/issues/57894 ) , Triton also allows
@@ -117,13 +117,13 @@ The section of model config file specifying this parameter will look like:
117117parameters: { 
118118key: "DISABLE_OPTIMIZED_EXECUTION" 
119119    value: { 
120-     string_value:"true" 
120+     string_value:  "true" 
121121    } 
122122} 
123123``` 
124124
125125*  ` INFERENCE_MODE ` : Boolean flag to enable the Inference Mode execution
126- of TorchScript models. By default the inference mode is disabled.
126+ of TorchScript models. By default,  the inference mode is disabled.
127127
128128[ InferenceMode] ( https://pytorch.org/cppdocs/notes/inference_mode.html )  is a new
129129RAII guard analogous to NoGradMode to be used when you are certain your operations
@@ -139,14 +139,14 @@ The section of model config file specifying this parameter will look like:
139139parameters: { 
140140key: "INFERENCE_MODE" 
141141    value: { 
142-     string_value:"true" 
142+     string_value:  "true" 
143143    } 
144144} 
145145``` 
146146
147147*  ` ENABLE_NVFUSER ` : Boolean flag to enable the NvFuser (CUDA Graph
148148Fuser) optimization for TorchScript models. If not specified, the
149- default pytorch  fuser is used. If ` ENABLE_NVFUSER `  is specified, the
149+ default PyTorch  fuser is used. If ` ENABLE_NVFUSER `  is specified, the
150150` ENABLE_TENSOR_FUSER `  configuration (see below) is ignored.
151151
152152Please note that in some models generated using trace in old PyTorch versions might not work
@@ -159,7 +159,7 @@ The section of model config file specifying this parameter will look like:
159159parameters: { 
160160key: "ENABLE_NVFUSER" 
161161    value: { 
162-     string_value:"true" 
162+     string_value:  "true" 
163163    } 
164164} 
165165``` 
@@ -174,7 +174,7 @@ The section of model config file specifying this parameter will look like:
174174parameters: { 
175175key: "ENABLE_WEIGHT_SHARING" 
176176    value: { 
177-     string_value:"true" 
177+     string_value:  "true" 
178178    } 
179179} 
180180``` 
@@ -191,18 +191,18 @@ complex execution modes and dynamic shapes. If not specified, all are enabled by
191191
192192### Important Note  
193193
194- *  The execution of pytorch  model on GPU is asynchronous in nature. See
194+ *  The execution of PyTorch  model on GPU is asynchronous in nature. See
195195  [ here] ( https://pytorch.org/docs/stable/notes/cuda.html#asynchronous-execution ) 
196-   for more details. Consequently, an error in pytorch  model execution may
196+   for more details. Consequently, an error in PyTorch  model execution may
197197  be raised during the next few inference requests to the server. Setting
198198  environment variable ` CUDA_LAUNCH_BLOCKING=1 `  when launching server will
199199  help in correctly debugging failing cases by forcing synchronous execution.
200200  *  The PyTorch model in such cases may or may not recover from the failed
201201    state and a restart of the server may be required to continue serving
202202    successfully.
203203
204- *  Multiple instances of the pytorch  model on GPU do not always
205-   increase performance. Due to thread specific caching in pytorch , using
204+ *  Multiple instances of the PyTorch  model on GPU do not always
205+   increase performance. Due to thread specific caching in PyTorch , using
206206  multiple instances of the model interact negatively. See
207207  [ here] ( https://github.com/pytorch/pytorch/issues/27902 )  for more details.
208208  Setting the parameter ` DISABLE_OPTIMIZED_EXECUTION `  to "true" in the model
0 commit comments