Skip to content

Commit 510cc49

Browse files
author
Hemant Jain
authored
Fix typos + cleanup ReadMe (triton-inference-server#62)
1 parent a4f3138 commit 510cc49

File tree

1 file changed

+13
-13
lines changed

1 file changed

+13
-13
lines changed

README.md

Lines changed: 13 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -67,7 +67,7 @@ $ make install
6767
```
6868

6969
The following required Triton repositories will be pulled and used in
70-
the build. By default the "main" branch/tag will be used for each repo
70+
the build. By default, the "main" branch/tag will be used for each repo
7171
but the listed CMake argument can be used to override.
7272

7373
* triton-inference-server/backend: -DTRITON_BACKEND_REPO_TAG=[tag]
@@ -100,10 +100,10 @@ $ make install
100100
### Parameters
101101

102102
Triton exposes some flags to control the execution mode of the TorchScript models through
103-
the Parameters section of the model's 'config.pbtxt' file.
103+
the Parameters section of the model's `config.pbtxt` file.
104104

105105
* `DISABLE_OPTIMIZED_EXECUTION`: Boolean flag to disable the optimized execution
106-
of TorchScript models. By default the optimized execuiton is always enabled.
106+
of TorchScript models. By default, the optimized execution is always enabled.
107107

108108
The initial calls to a loaded TorchScript model take extremely long. Due to this longer
109109
model warmup [issue](https://github.com/pytorch/pytorch/issues/57894), Triton also allows
@@ -117,13 +117,13 @@ The section of model config file specifying this parameter will look like:
117117
parameters: {
118118
key: "DISABLE_OPTIMIZED_EXECUTION"
119119
value: {
120-
string_value:"true"
120+
string_value: "true"
121121
}
122122
}
123123
```
124124

125125
* `INFERENCE_MODE`: Boolean flag to enable the Inference Mode execution
126-
of TorchScript models. By default the inference mode is disabled.
126+
of TorchScript models. By default, the inference mode is disabled.
127127

128128
[InferenceMode](https://pytorch.org/cppdocs/notes/inference_mode.html) is a new
129129
RAII guard analogous to NoGradMode to be used when you are certain your operations
@@ -139,14 +139,14 @@ The section of model config file specifying this parameter will look like:
139139
parameters: {
140140
key: "INFERENCE_MODE"
141141
value: {
142-
string_value:"true"
142+
string_value: "true"
143143
}
144144
}
145145
```
146146

147147
* `ENABLE_NVFUSER`: Boolean flag to enable the NvFuser (CUDA Graph
148148
Fuser) optimization for TorchScript models. If not specified, the
149-
default pytorch fuser is used. If `ENABLE_NVFUSER` is specified, the
149+
default PyTorch fuser is used. If `ENABLE_NVFUSER` is specified, the
150150
`ENABLE_TENSOR_FUSER` configuration (see below) is ignored.
151151

152152
Please note that in some models generated using trace in old PyTorch versions might not work
@@ -159,7 +159,7 @@ The section of model config file specifying this parameter will look like:
159159
parameters: {
160160
key: "ENABLE_NVFUSER"
161161
value: {
162-
string_value:"true"
162+
string_value: "true"
163163
}
164164
}
165165
```
@@ -174,7 +174,7 @@ The section of model config file specifying this parameter will look like:
174174
parameters: {
175175
key: "ENABLE_WEIGHT_SHARING"
176176
value: {
177-
string_value:"true"
177+
string_value: "true"
178178
}
179179
}
180180
```
@@ -191,18 +191,18 @@ complex execution modes and dynamic shapes. If not specified, all are enabled by
191191

192192
### Important Note
193193

194-
* The execution of pytorch model on GPU is asynchronous in nature. See
194+
* The execution of PyTorch model on GPU is asynchronous in nature. See
195195
[here](https://pytorch.org/docs/stable/notes/cuda.html#asynchronous-execution)
196-
for more details. Consequently, an error in pytorch model execution may
196+
for more details. Consequently, an error in PyTorch model execution may
197197
be raised during the next few inference requests to the server. Setting
198198
environment variable `CUDA_LAUNCH_BLOCKING=1` when launching server will
199199
help in correctly debugging failing cases by forcing synchronous execution.
200200
* The PyTorch model in such cases may or may not recover from the failed
201201
state and a restart of the server may be required to continue serving
202202
successfully.
203203

204-
* Multiple instances of the pytorch model on GPU do not always
205-
increase performance. Due to thread specific caching in pytorch, using
204+
* Multiple instances of the PyTorch model on GPU do not always
205+
increase performance. Due to thread specific caching in PyTorch, using
206206
multiple instances of the model interact negatively. See
207207
[here](https://github.com/pytorch/pytorch/issues/27902) for more details.
208208
Setting the parameter `DISABLE_OPTIMIZED_EXECUTION` to "true" in the model

0 commit comments

Comments
 (0)