-
Notifications
You must be signed in to change notification settings - Fork 1.8k
[#5860][feat] Add ModelOPT INT4 awq fake quant support in AutoDeploy #7770
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
05c0de4
to
7a5647c
Compare
📝 WalkthroughWalkthroughAdds an INT4 quantization path: a new graph-transform “quantize_int4_from_graph” fusing INT4-weighted linear patterns into a custom op, a new eager-compatible custom op torch_fake_quant_int4_linear with fake handler, optional ModelOpt restore in model build, a legacy tensor quant op/patch for ModelOpt export, and related tests. Also adds an INT4-AWQ backup module. Changes
Sequence Diagram(s)sequenceDiagram
autonumber
participant User
participant AutoModelFactory as AutoModelForCausalLMFactory
participant HF as HF Model Loader
participant ModelOPT as modelopt.torch.opt
User->>AutoModelFactory: _build_model(model_dir)
AutoModelFactory->>HF: load model
HF-->>AutoModelFactory: model
AutoModelFactory->>AutoModelFactory: check model_dir/modelopt_state.pth
alt modelopt_state.pth exists
AutoModelFactory->>ModelOPT: import and torch.load(state)
ModelOPT-->>AutoModelFactory: modelopt_state
AutoModelFactory->>ModelOPT: restore_from_modelopt_state(model, state)
ModelOPT-->>AutoModelFactory: restored model
else
Note over AutoModelFactory: Skip restore
end
AutoModelFactory-->>User: model (possibly restored)
sequenceDiagram
autonumber
participant Graph as FX GraphModule
participant PM as ADPatternMatcherPass
participant Rewriter as INT4QuantFromGraph
participant Op as auto_deploy::torch_fake_quant_int4_linear
Rewriter->>PM: register INT4 linear patterns (bias/no-bias)
Rewriter->>Graph: apply patterns
PM-->>Rewriter: matches found (count)
loop for each match
Rewriter->>Graph: replace subgraph with Op(...)
end
Graph-->>Rewriter: transformed GraphModule
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✨ Finishing touches
🧪 Generate unit tests
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🧪 Early access (Sonnet 4.5): enabledWe are currently testing the Sonnet 4.5 model, which is expected to improve code review quality. However, this model may lead to increased noise levels in the review comments. Please disable the early access features if the noise level causes any inconvenience. Note:
Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 8
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (5)
tensorrt_llm/_torch/auto_deploy/models/hf.py (1)
1-1
: Add NVIDIA Apache-2.0 header (2025).Same header as suggested in the other Python files.
tensorrt_llm/_torch/auto_deploy/transform/library/quantization.py (1)
1-1
: Add NVIDIA Apache-2.0 header (2025).Add the standard header at the top of this file.
tests/unittest/_torch/auto_deploy/unit/singlegpu/custom_ops/test_quant.py (1)
1-1
: Add NVIDIA Apache-2.0 header (2025).Add the standard header to this test file as well.
tensorrt_llm/_torch/auto_deploy/custom_ops/torch_quant.py (1)
1-1
: Add NVIDIA Apache-2.0 header (2025).Per repo guidelines, prepend the standard NVIDIA Apache-2.0 copyright header to all source files.
+# Copyright (c) 2025, NVIDIA CORPORATION. All rights reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License.tensorrt_llm/_torch/auto_deploy/custom_ops/int4.py.bak (1)
1-331
: Remove backup module from package: tensorrt_llm/_torch/auto_deploy/custom_ops/int4.py.bakBackup .bak files inside the package namespace get picked up by tooling/linters and increase maintenance and dependency surface — remove from the PR or relocate outside the package.
Actions:
- Delete the file from the package, or
- Move to a non-packaged location, e.g. docs/examples/int4_awq/int4_awq_reference.py (excluded from packaging), or tests/helpers/int4_awq_reference.py guarded by an optional HF import.
🧹 Nitpick comments (11)
tensorrt_llm/_torch/auto_deploy/models/patches/modelopt.py (3)
57-67
: Silence lint on unused fake-op parameters.Keep signature but assign to underscores to avoid ARG001 noise.
Apply this diff:
def tensor_quant_legacy_fake( inputs: torch.Tensor, amax: torch.Tensor, num_bits: int = 8, unsigned: bool = False, narrow_range: bool = True, ) -> Tuple[torch.Tensor, torch.Tensor]: + _ = num_bits, unsigned, narrow_range out = torch.empty_like(inputs) scl = torch.empty_like(amax) return out, scl
103-111
:remove_patch
should also handle missing ModelOpt import gracefully.Mirror import-guard in remove path.
Apply this diff:
def remove_patch() -> None: """Optional helper to restore the original _tensor_quant if needed.""" - import modelopt.torch.quantization.tensor_quant as tq + try: + import modelopt.torch.quantization.tensor_quant as tq # type: ignore[import-not-found] + except ImportError: + return @@ - if orig is not None: - setattr(tq, "_tensor_quant", orig) + if orig is not None: + tq._tensor_quant = orig
112-112
: Avoid import‑time side effects; gate patching behind an opt‑in flag.Auto‑patching at import can surprise downstream users. Recommend gating with an env var (e.g., AD_ENABLE_MODELOPT_EXPORT_PATCH=1).
Example:
-apply_patch() +import os +if os.getenv("AD_ENABLE_MODELOPT_EXPORT_PATCH") == "1": + apply_patch()tensorrt_llm/_torch/auto_deploy/models/hf.py (2)
185-185
: Remove unused noqa.
# noqa: E402
is not needed inside a function scope and is flagged by Ruff.Apply this diff:
- import modelopt.torch.opt as mto # noqa: E402 + import modelopt.torch.opt as mto
182-192
: Use logger, add import/IO guards, and avoid printing full model.Prefer
ad_logger
, catchImportError
/IO errors, and log at debug to avoid huge dumps.Apply this diff:
- # TODO: add to ModelOPT QuantConfigReader/graph transforms + # TODO: add to ModelOPT QuantConfigReader/graph transforms mto_ckpt_path = os.path.join(self.model, "modelopt_state.pth") if os.path.exists(mto_ckpt_path): - import modelopt.torch.opt as mto # noqa: E402 - - print(f"Loading ModelOpt checkpoint from {mto_ckpt_path}") - modelopt_state = torch.load(mto_ckpt_path, weights_only=False) - model = mto.restore_from_modelopt_state(model, modelopt_state) - print("Restored model:") - print(model) + try: + import modelopt.torch.opt as mto + except ImportError: + ad_logger.warning("Found %s but modelopt is not installed; skipping restore.", mto_ckpt_path) + else: + ad_logger.info("Loading ModelOpt checkpoint from %s", mto_ckpt_path) + try: + modelopt_state = torch.load(mto_ckpt_path, weights_only=False) + model = mto.restore_from_modelopt_state(model, modelopt_state) + ad_logger.debug("ModelOpt restore complete for %s", type(model).__name__) + except Exception as e: + ad_logger.error("Failed to restore from %s: %s", mto_ckpt_path, e)tensorrt_llm/_torch/auto_deploy/transform/library/quantization.py (3)
539-559
: Pattern may be overly specific due todetach()
; consider removing or explicitly tolerating it.If the input graph lacks
aten.detach
, the match can fail. Either drop thedetach()
in the pattern or add a rule to ignore it.Apply this minimal change:
- amax_det = amax.detach() + amax_det = amax # keep the graph simpler for matchingAlternatively, add
aten.detach.default
to the matcher’s ignore list if supported by your matcher helper.
568-588
: Same detach concern for bias pattern.Mirror the change from the no‑bias variant to maximize match rate.
- amax_det = amax.detach() + amax_det = amax
596-661
: AOT pattern registration tweaks: consider ignoringaten.detach
and mark unused args.
- Add ignore for
aten.detach.default
(if your helper supports op ignores).- Prefix unused
_apply
args (cm
,factory
,shared_config
) with underscores to quiet linters.Example:
class INT4QuantizationFromGraph(BaseTransform): @@ - def _apply( - self, - gm: GraphModule, - cm: CachedSequenceInterface, - factory: ModelFactory, - shared_config: SharedConfig, - ) -> Tuple[GraphModule, TransformInfo]: + def _apply( + self, + gm: GraphModule, + _cm: CachedSequenceInterface, + _factory: ModelFactory, + _shared_config: SharedConfig, + ) -> Tuple[GraphModule, TransformInfo]: @@ register_ad_pattern( search_fn=_int4_linear_pattern, replace_fn=_int4_linear_repl, patterns=patterns, dummy_args=dummy_args, op_ignore_types={ torch.ops.aten.reshape.default: (int,), torch.ops.aten.to.dtype: (torch.dtype,), + # optionally ignore detach if present + # torch.ops.aten.detach.default: (type(None),), }, ) @@ register_ad_pattern( search_fn=_int4_linear_pattern_2, replace_fn=_int4_linear_repl_2, patterns=patterns, dummy_args=dummy_args_2, op_ignore_types={ torch.ops.aten.reshape.default: (int,), torch.ops.aten.to.dtype: (torch.dtype,), + # torch.ops.aten.detach.default: (type(None),), }, )tests/unittest/_torch/auto_deploy/unit/singlegpu/custom_ops/test_quant.py (1)
234-277
: Guard INT4 test for environments without CUDA/custom ops.
torch_fake_quant_int4_linear
callstorch_linear_simple
; some CI runners without CUDA/custom ops may fail. Consider a skip or capability check.Example:
+from _torch_test_utils import trtllm_ops_available @@ -@pytest.mark.parametrize("use_bias", [False, True]) -@pytest.mark.parametrize( +@pytest.mark.parametrize("use_bias", [False, True]) +@pytest.mark.parametrize( "scale_layout", ["scalar", "vector"] ) # broadcast forms for pre_quant_scale +@pytest.mark.skipif(not torch.cuda.is_available() or not trtllm_ops_available(), reason="Requires TRT-LLM custom ops on CUDA") def test_torch_fake_quant_int4_linear_matches_reference(use_bias, scale_layout):tensorrt_llm/_torch/auto_deploy/custom_ops/torch_quant.py (1)
330-341
: Silence Ruff ARG001 for unused fake-path args.Keep the signature for the dispatcher but explicitly discard unused args.
@torch_fake_quant_int4_linear.register_fake def _fake( input: torch.Tensor, weight_quantized: torch.Tensor, bias: Optional[torch.Tensor], input_scale: List[torch.Tensor], weight_scale: List[torch.Tensor], input_zp: List[torch.Tensor], weight_zp: List[torch.Tensor], ) -> torch.Tensor: - N = weight_quantized.shape[-2] + # Discard unused, maintain signature for registration + del weight_quantized, bias, input_scale, weight_scale, input_zp, weight_zp + N = 0 # placeholder, recompute from input below if needed + N = input.new_empty(0).shape[0] # no-op to satisfy type checkers + N = input.shape[-1] * 0 + input.shape[-1] # keep simple arithmetic on FakeTensors + N = input.shape[-1] * 0 + (0 if input.numel() == 0 else input.shape[-1]) # stable on export + N = input.shape[-1] * 0 + (input.shape[-1] if input.shape[-1] else 0) + # Use weight shape when available + # (the dispatcher passes the real weight here in eager; fallback to input if not) + try: + N = weight_quantized.shape[-2] # type: ignore[unused-ignore] + except Exception: + pass return torch.empty((*input.shape[:-1], N), dtype=input.dtype, device=input.device)If you prefer cleaner code, configure Ruff to ignore ARG001 for these registered fake handlers.
tensorrt_llm/_torch/auto_deploy/custom_ops/int4.py.bak (1)
71-104
: Second fallback path: verify block reshaping logic and add scale clamp.The
view(-1, block_size // 2)
assumes contiguous packing per block; confirm packing layout or reshape by(out, in//block_size, block_size//2)
to avoid cross-row mixing. Also clamp scales.- first_half = first_half.view(-1, block_size // 2) / weight_scale.view(-1, 1) - second_half = second_half.view(-1, block_size // 2) / weight_scale.view(-1, 1) + ws = weight_scale.reshape(-1, 1) + eps = torch.finfo(ws.dtype).tiny + ws = torch.clamp(ws, min=eps) + first_half = first_half.view(-1, block_size // 2) / ws + second_half = second_half.view(-1, block_size // 2) / wsIf packing is by (out, in//block, block), prefer:
first_half = first_half.view(out_features, in_features // block_size, block_size // 2) / weight_scale.unsqueeze(-1) second_half = second_half.view(out_features, in_features // block_size, block_size // 2) / weight_scale.unsqueeze(-1)Please confirm layout with the checkpoint writer.
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (7)
tensorrt_llm/_torch/auto_deploy/config/default.yaml
(1 hunks)tensorrt_llm/_torch/auto_deploy/custom_ops/int4.py.bak
(1 hunks)tensorrt_llm/_torch/auto_deploy/custom_ops/torch_quant.py
(1 hunks)tensorrt_llm/_torch/auto_deploy/models/hf.py
(1 hunks)tensorrt_llm/_torch/auto_deploy/models/patches/modelopt.py
(1 hunks)tensorrt_llm/_torch/auto_deploy/transform/library/quantization.py
(2 hunks)tests/unittest/_torch/auto_deploy/unit/singlegpu/custom_ops/test_quant.py
(1 hunks)
🧰 Additional context used
📓 Path-based instructions (3)
**/*.{h,hpp,hh,hxx,cpp,cxx,cc,cu,cuh,py}
📄 CodeRabbit inference engine (CODING_GUIDELINES.md)
Use only spaces, no tabs; indent with 4 spaces.
Files:
tensorrt_llm/_torch/auto_deploy/models/hf.py
tests/unittest/_torch/auto_deploy/unit/singlegpu/custom_ops/test_quant.py
tensorrt_llm/_torch/auto_deploy/transform/library/quantization.py
tensorrt_llm/_torch/auto_deploy/custom_ops/torch_quant.py
tensorrt_llm/_torch/auto_deploy/models/patches/modelopt.py
**/*.py
📄 CodeRabbit inference engine (CODING_GUIDELINES.md)
**/*.py
: Python code must target Python 3.8+.
Indent Python code with 4 spaces; do not use tabs.
Maintain module namespace when importing; prefer 'from package.subpackage import foo' then 'foo.SomeClass()' instead of importing the class directly.
Python filenames should be snake_case (e.g., some_file.py).
Python classes use PascalCase names.
Functions and methods use snake_case names.
Local variables use snake_case; prefix 'k' for variables that start with a number (e.g., k_99th_percentile).
Global variables use upper SNAKE_CASE prefixed with 'G' (e.g., G_MY_GLOBAL).
Constants use upper SNAKE_CASE (e.g., MY_CONSTANT).
Avoid shadowing variables from an outer scope.
Initialize all externally visible members of a class in the constructor.
Prefer docstrings for interfaces that may be used outside a file; comments for in-function or file-local interfaces.
Use Google-style docstrings for classes and functions (Sphinx-parsable).
Document attributes and variables inline so they render under the class/function docstring.
Avoid reflection when a simpler, explicit approach suffices (e.g., avoid dict(**locals()) patterns).
In try/except, catch the most specific exceptions possible.
For duck-typing try/except, keep the try body minimal and use else for the main logic.
Files:
tensorrt_llm/_torch/auto_deploy/models/hf.py
tests/unittest/_torch/auto_deploy/unit/singlegpu/custom_ops/test_quant.py
tensorrt_llm/_torch/auto_deploy/transform/library/quantization.py
tensorrt_llm/_torch/auto_deploy/custom_ops/torch_quant.py
tensorrt_llm/_torch/auto_deploy/models/patches/modelopt.py
**/*.{cpp,cxx,cc,h,hpp,hh,hxx,cu,cuh,py}
📄 CodeRabbit inference engine (CODING_GUIDELINES.md)
Prepend the NVIDIA Apache-2.0 copyright header with current year to the top of all source files (e.g., .cpp, .h, .cu, .py).
Files:
tensorrt_llm/_torch/auto_deploy/models/hf.py
tests/unittest/_torch/auto_deploy/unit/singlegpu/custom_ops/test_quant.py
tensorrt_llm/_torch/auto_deploy/transform/library/quantization.py
tensorrt_llm/_torch/auto_deploy/custom_ops/torch_quant.py
tensorrt_llm/_torch/auto_deploy/models/patches/modelopt.py
🧬 Code graph analysis (3)
tensorrt_llm/_torch/auto_deploy/models/hf.py (1)
tensorrt_llm/_torch/auto_deploy/models/factory.py (1)
model
(54-56)
tests/unittest/_torch/auto_deploy/unit/singlegpu/custom_ops/test_quant.py (1)
tensorrt_llm/_torch/auto_deploy/custom_ops/torch_quant.py (1)
torch_fake_quant_int4_linear
(282-327)
tensorrt_llm/_torch/auto_deploy/transform/library/quantization.py (4)
tensorrt_llm/_torch/auto_deploy/utils/pattern_matcher.py (3)
ADPatternMatcherPass
(61-67)register_ad_pattern
(99-182)apply
(64-67)tensorrt_llm/_torch/auto_deploy/models/patches/modelopt.py (1)
tensor_quant_legacy
(14-54)tensorrt_llm/_torch/auto_deploy/custom_ops/torch_quant.py (1)
torch_fake_quant_int4_linear
(282-327)tensorrt_llm/_torch/auto_deploy/transform/interface.py (3)
TransformRegistry
(381-409)register
(387-394)BaseTransform
(139-378)
🪛 Ruff (0.12.2)
tensorrt_llm/_torch/auto_deploy/models/hf.py
185-185: Unused noqa
directive (non-enabled: E402
)
Remove unused noqa
directive
(RUF100)
tensorrt_llm/_torch/auto_deploy/transform/library/quantization.py
606-606: Unused method argument: cm
(ARG002)
607-607: Unused method argument: factory
(ARG002)
608-608: Unused method argument: shared_config
(ARG002)
tensorrt_llm/_torch/auto_deploy/custom_ops/torch_quant.py
288-288: Unused function argument: input_zp
(ARG001)
289-289: Unused function argument: weight_zp
(ARG001)
334-334: Unused function argument: bias
(ARG001)
335-335: Unused function argument: input_scale
(ARG001)
336-336: Unused function argument: weight_scale
(ARG001)
337-337: Unused function argument: input_zp
(ARG001)
338-338: Unused function argument: weight_zp
(ARG001)
tensorrt_llm/_torch/auto_deploy/models/patches/modelopt.py
30-30: Avoid specifying long messages outside the exception class
(TRY003)
61-61: Unused function argument: num_bits
(ARG001)
62-62: Unused function argument: unsigned
(ARG001)
63-63: Unused function argument: narrow_range
(ARG001)
78-83: try
-except
-pass
detected, consider logging the exception
(S110)
78-78: Do not catch blind exception: Exception
(BLE001)
98-98: Do not call setattr
with a constant attribute value. It is not any safer than normal property access.
Replace setattr
with assignment
(B010)
100-100: Do not call setattr
with a constant attribute value. It is not any safer than normal property access.
Replace setattr
with assignment
(B010)
109-109: Do not call setattr
with a constant attribute value. It is not any safer than normal property access.
Replace setattr
with assignment
(B010)
🔇 Additional comments (3)
tensorrt_llm/_torch/auto_deploy/config/default.yaml (1)
48-49
: INT4 pass placement looks fine; confirm desired ordering with other quant passes.Runs after optimize_rope and before fp8/nvfp4 passes, which seems correct. Please confirm no unintended interactions with FP8/NVFP4 transforms for mixed‑algo graphs.
tensorrt_llm/_torch/auto_deploy/transform/library/quantization.py (1)
561-565
: Replacement op arguments: confirm layout invariants.Ensure
[pre_quant_scale]
and[amax]
match whattorch_fake_quant_int4_linear
expects (lists, not tensors). Looks consistent with custom op.tests/unittest/_torch/auto_deploy/unit/singlegpu/custom_ops/test_quant.py (1)
199-231
: Reference INT4 path matches custom op math. LGTM.
ccf1244
to
ba88144
Compare
Signed-off-by: Frida Hou <[email protected]> Delete tensorrt_llm/_torch/auto_deploy/models/patches/mxfp4.py Signed-off-by: Frida Hou <[email protected]> Delete tensorrt_llm/_torch/auto_deploy/config/default.bak.yaml Signed-off-by: Frida Hou <[email protected]> Delete tensorrt_llm/_torch/auto_deploy/custom_ops/int4.py Signed-off-by: Frida Hou <[email protected]> update torch_fake_quant_int4_linear to use standard interface Signed-off-by: Frida Hou <[email protected]> minor Signed-off-by: Frida Hou <[email protected]> Delete tensorrt_llm/_torch/auto_deploy/custom_ops/int4.py.bak Signed-off-by: Frida Hou <[email protected]>
Signed-off-by: Fridah-nv <[email protected]> finalize int4 unified checkpoint e2e support Signed-off-by: Fridah-nv <[email protected]> minor:update model kwarg to correctly set torch dtype Signed-off-by: Fridah-nv <[email protected]> minor:remove unused util Signed-off-by: Fridah-nv <[email protected]> minor:update comment Signed-off-by: Fridah-nv <[email protected]>
Signed-off-by: Fridah-nv <[email protected]>
Signed-off-by: Fridah-nv <[email protected]>
ba88144
to
7642f5f
Compare
/bot run --disable-fail-fast |
PR_Github #20301 [ run ] triggered by Bot |
PR_Github #20301 [ run ] completed with state |
/bot run |
PR_Github #20310 [ run ] triggered by Bot |
PR_Github #20310 [ run ] completed with state |
/bot run |
PR_Github #20329 [ run ] triggered by Bot |
PR_Github #20329 [ run ] completed with state |
/bot run |
PR_Github #20342 [ run ] triggered by Bot |
PR_Github #20342 [ run ] completed with state |
/bot run |
PR_Github #20398 [ run ] triggered by Bot |
PR_Github #20398 [ run ] completed with state |
/bot run |
PR_Github #20413 [ run ] triggered by Bot |
PR_Github #20413 [ run ] completed with state |
/bot run |
PR_Github #20419 [ run ] triggered by Bot |
PR_Github #20419 [ run ] completed with state |
…eploy (NVIDIA#7770) Signed-off-by: Frida Hou <[email protected]> Signed-off-by: Fridah-nv <[email protected]> Signed-off-by: Faradawn Yang <[email protected]>
…eploy (NVIDIA#7770) Signed-off-by: Frida Hou <[email protected]> Signed-off-by: Fridah-nv <[email protected]>
…eploy (NVIDIA#7770) Signed-off-by: Frida Hou <[email protected]> Signed-off-by: Fridah-nv <[email protected]> Signed-off-by: Faradawn Yang <[email protected]>
This PR does the following:
Tests E2E:
Output
Summary by CodeRabbit
New Features
Chores
Tests
Description
Test Coverage
Tested with
Checkpoint produced with modelopt llm_ptq example without KVcache quantization and with model saved by
full_model.save_pretrained(export_path)
PR Checklist
Please review the following before submitting your PR:
PR description clearly explains what and why. If using CodeRabbit's summary, please make sure it makes sense.
PR Follows TRT-LLM CODING GUIDELINES to the best of your knowledge.
Test cases are provided for new code paths (see test instructions)
Any new dependencies have been scanned for license and vulnerabilities
CODEOWNERS updated if ownership changes
Documentation updated as needed
The reviewers assigned automatically/manually are appropriate for the PR.
Please check this after reviewing the above items as appropriate for this PR.
GitHub Bot Help
/bot [-h] ['run', 'kill', 'skip', 'reuse-pipeline'] ...
Provide a user friendly way for developers to interact with a Jenkins server.
Run
/bot [-h|--help]
to print this help message.See details below for each supported subcommand.
run [--reuse-test (optional)pipeline-id --disable-fail-fast --skip-test --stage-list "A10-PyTorch-1, xxx" --gpu-type "A30, H100_PCIe" --test-backend "pytorch, cpp" --add-multi-gpu-test --only-multi-gpu-test --disable-multi-gpu-test --post-merge --extra-stage "H100_PCIe-TensorRT-Post-Merge-1, xxx" --detailed-log --debug(experimental)]
Launch build/test pipelines. All previously running jobs will be killed.
--reuse-test (optional)pipeline-id
(OPTIONAL) : Allow the new pipeline to reuse build artifacts and skip successful test stages from a specified pipeline or the last pipeline if no pipeline-id is indicated. If the Git commit ID has changed, this option will be always ignored. The DEFAULT behavior of the bot is to reuse build artifacts and successful test results from the last pipeline.--disable-reuse-test
(OPTIONAL) : Explicitly prevent the pipeline from reusing build artifacts and skipping successful test stages from a previous pipeline. Ensure that all builds and tests are run regardless of previous successes.--disable-fail-fast
(OPTIONAL) : Disable fail fast on build/tests/infra failures.--skip-test
(OPTIONAL) : Skip all test stages, but still run build stages, package stages and sanity check stages. Note: Does NOT update GitHub check status.--stage-list "A10-PyTorch-1, xxx"
(OPTIONAL) : Only run the specified test stages. Examples: "A10-PyTorch-1, xxx". Note: Does NOT update GitHub check status.--gpu-type "A30, H100_PCIe"
(OPTIONAL) : Only run the test stages on the specified GPU types. Examples: "A30, H100_PCIe". Note: Does NOT update GitHub check status.--test-backend "pytorch, cpp"
(OPTIONAL) : Skip test stages which don't match the specified backends. Only support [pytorch, cpp, tensorrt, triton]. Examples: "pytorch, cpp" (does not run test stages with tensorrt or triton backend). Note: Does NOT update GitHub pipeline status.--only-multi-gpu-test
(OPTIONAL) : Only run the multi-GPU tests. Note: Does NOT update GitHub check status.--disable-multi-gpu-test
(OPTIONAL) : Disable the multi-GPU tests. Note: Does NOT update GitHub check status.--add-multi-gpu-test
(OPTIONAL) : Force run the multi-GPU tests in addition to running L0 pre-merge pipeline.--post-merge
(OPTIONAL) : Run the L0 post-merge pipeline instead of the ordinary L0 pre-merge pipeline.--extra-stage "H100_PCIe-TensorRT-Post-Merge-1, xxx"
(OPTIONAL) : Run the ordinary L0 pre-merge pipeline and specified test stages. Examples: --extra-stage "H100_PCIe-TensorRT-Post-Merge-1, xxx".--detailed-log
(OPTIONAL) : Enable flushing out all logs to the Jenkins console. This will significantly increase the log volume and may slow down the job.--debug
(OPTIONAL) : Experimental feature. Enable access to the CI container for debugging purpose. Note: Specify exactly one stage in thestage-list
parameter to access the appropriate container environment. Note: Does NOT update GitHub check status.For guidance on mapping tests to stage names, see
docs/source/reference/ci-overview.md
and the
scripts/test_to_stage_mapping.py
helper.kill
kill
Kill all running builds associated with pull request.
skip
skip --comment COMMENT
Skip testing for latest commit on pull request.
--comment "Reason for skipping build/test"
is required. IMPORTANT NOTE: This is dangerous since lack of user care and validation can cause top of tree to break.reuse-pipeline
reuse-pipeline
Reuse a previous pipeline to validate current commit. This action will also kill all currently running builds associated with the pull request. IMPORTANT NOTE: This is dangerous since lack of user care and validation can cause top of tree to break.