@@ -698,16 +698,17 @@ is not C-order contiguous an exception will be raised.
698698
699699This function can be used to check whether a tensor is placed in CPU or not.
700700
701- ## Controlling Input Tensor Device Placement
702-
703- By default Python backend moves all the input tensors to CPU. Starting from
704- 21.09 release, you can control whether you want to move input tensors to CPU or
705- let Triton decide the placement of the input tensors. If you let Triton decide
706- the placement of input tensors, your Python model must be able to handle tensors
707- that are in CPU or GPU. You can control this using the
708- ` FORCE_CPU_ONLY_INPUT_TENSORS ` setting in your Python model configuration. The
709- default value for this parameter is "yes". By adding the line below to your
710- model config, you are letting Triton decide the placement of input Tensors:
701+ ## Input Tensor Device Placement
702+
703+ By default, the Python backend moves all input tensors to CPU before providing
704+ them to the Python model. Starting from 21.09, you can change this default
705+ behavior. By setting ` FORCE_CPU_ONLY_INPUT_TENSORS ` to "no", Triton will not
706+ move input tensors to CPU for the Python model. Instead, Triton will provide the
707+ input tensors to the Python model in either CPU or GPU memory, depending on how
708+ those tensors were last used. You cannot predict which memory will be used for
709+ each input tensor so your Python model must be able to handle tensors in both
710+ CPU and GPU memory. To enable this setting, you need to add this setting to the
711+ ` parameters ` section of model configuration:
711712
712713```
713714parameters: { key: "FORCE_CPU_ONLY_INPUT_TENSORS" value: {string_value:"no"}}
0 commit comments