-
Notifications
You must be signed in to change notification settings - Fork 1.8k
[None][feat] AutoDeploy add autotuning when capturing cudagraphs #8120
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: Suyog Gupta <[email protected]>
📝 WalkthroughWalkthroughIntroduces autotuner integration into CUDA graph capture warm-up: imports autotune from tensorrt_llm._torch.autotuner and wraps the warm-up phase in _capture_one_graph with both CudaGraphWarmUpPhase and autotune(). Changes
Sequence Diagram(s)sequenceDiagram
autonumber
participant Caller
participant TorchCudaGraph as torch_cudagraph backend
participant WarmUp as CudaGraphWarmUpPhase
participant Autotune as autotune()
Caller->>TorchCudaGraph: _capture_one_graph(...)
activate TorchCudaGraph
Note over TorchCudaGraph: Begin CUDA graph capture setup
TorchCudaGraph->>WarmUp: enter
activate WarmUp
WarmUp->>Autotune: enter
activate Autotune
Note over WarmUp,Autotune: Warm-up iterations with autotuning
Autotune-->>WarmUp: exit
deactivate Autotune
WarmUp-->>TorchCudaGraph: exit
deactivate WarmUp
TorchCudaGraph->>TorchCudaGraph: Capture CUDA graph
TorchCudaGraph-->>Caller: Return captured graph
deactivate TorchCudaGraph
Estimated code review effort🎯 2 (Simple) | ⏱️ ~10 minutes Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✨ Finishing touches
🧪 Generate unit tests
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (1)
tensorrt_llm/_torch/auto_deploy/compile/backends/torch_cudagraph.py (1)
57-58
: Persist autotune profiles by specifying cache_path. Default cache_path=None disables loading and saving of tuning results—autotuning will rerun on every graph capture and restart. Pass a filesystem path to autotune(cache_path=…) to enable profiling_cache.load/save and avoid redundant tuning.
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
tensorrt_llm/_torch/auto_deploy/compile/backends/torch_cudagraph.py
(2 hunks)
🧰 Additional context used
📓 Path-based instructions (3)
**/*.{h,hpp,hh,hxx,cpp,cxx,cc,cu,cuh,py}
📄 CodeRabbit inference engine (CODING_GUIDELINES.md)
Use only spaces, no tabs; indent with 4 spaces.
Files:
tensorrt_llm/_torch/auto_deploy/compile/backends/torch_cudagraph.py
**/*.py
📄 CodeRabbit inference engine (CODING_GUIDELINES.md)
**/*.py
: Python code must target Python 3.8+.
Indent Python code with 4 spaces; do not use tabs.
Maintain module namespace when importing; prefer 'from package.subpackage import foo' then 'foo.SomeClass()' instead of importing the class directly.
Python filenames should be snake_case (e.g., some_file.py).
Python classes use PascalCase names.
Functions and methods use snake_case names.
Local variables use snake_case; prefix 'k' for variables that start with a number (e.g., k_99th_percentile).
Global variables use upper SNAKE_CASE prefixed with 'G' (e.g., G_MY_GLOBAL).
Constants use upper SNAKE_CASE (e.g., MY_CONSTANT).
Avoid shadowing variables from an outer scope.
Initialize all externally visible members of a class in the constructor.
Prefer docstrings for interfaces that may be used outside a file; comments for in-function or file-local interfaces.
Use Google-style docstrings for classes and functions (Sphinx-parsable).
Document attributes and variables inline so they render under the class/function docstring.
Avoid reflection when a simpler, explicit approach suffices (e.g., avoid dict(**locals()) patterns).
In try/except, catch the most specific exceptions possible.
For duck-typing try/except, keep the try body minimal and use else for the main logic.
Files:
tensorrt_llm/_torch/auto_deploy/compile/backends/torch_cudagraph.py
**/*.{cpp,cxx,cc,h,hpp,hh,hxx,cu,cuh,py}
📄 CodeRabbit inference engine (CODING_GUIDELINES.md)
Prepend the NVIDIA Apache-2.0 copyright header with current year to the top of all source files (e.g., .cpp, .h, .cu, .py).
Files:
tensorrt_llm/_torch/auto_deploy/compile/backends/torch_cudagraph.py
🧬 Code graph analysis (1)
tensorrt_llm/_torch/auto_deploy/compile/backends/torch_cudagraph.py (1)
tensorrt_llm/_torch/autotuner.py (1)
autotune
(210-242)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Pre-commit Check
🔇 Additional comments (1)
tensorrt_llm/_torch/auto_deploy/compile/backends/torch_cudagraph.py (1)
13-13
: LGTM!The import follows coding guidelines by maintaining the module namespace structure.
Signed-off-by: Suyog Gupta <[email protected]>
/bot run |
PR_Github #20547 [ run ] triggered by Bot |
Signed-off-by: Suyog Gupta <[email protected]>
/bot run |
PR_Github #20548 [ run ] triggered by Bot |
PR_Github #20547 [ run ] completed with state |
PR_Github #20548 [ run ] completed with state |
/bot run |
PR_Github #20554 [ run ] triggered by Bot |
PR_Github #20554 [ run ] completed with state |
/bot run |
PR_Github #20558 [ run ] triggered by Bot |
PR_Github #20558 [ run ] completed with state |
/bot run |
PR_Github #20570 [ run ] triggered by Bot |
PR_Github #20570 [ run ] completed with state |
…DIA#8120) Signed-off-by: Suyog Gupta <[email protected]>
Enable autotuning when capturing cudagraphs.
Perf results:
Mixtral 7x8B, H200, ISL/OSL=128/128
AutoDeploy without autotuning
AutoDeploy with autotuning
Pytorch:
Summary by CodeRabbit