-
Notifications
You must be signed in to change notification settings - Fork 1.8k
[https://nvbugs/5488582][fix] Avoid unexpected Triton recompilation in DG fused_moe. #7495
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
📝 WalkthroughWalkthroughConverted a Triton kernel parameter from a compile-time constant (tl.constexpr) to a runtime argument and updated its usage and invocation accordingly within tensorrt_llm/_torch/modules/fused_moe/fused_moe_deepgemm.py. Changes
Estimated code review effort🎯 2 (Simple) | ⏱️ ~10 minutes Suggested reviewers
📜 Recent review detailsConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Pro 💡 Knowledge Base configuration:
You can enable these sources in your CodeRabbit configuration. 📒 Files selected for processing (1)
🧰 Additional context used📓 Path-based instructions (4)**/*📄 CodeRabbit inference engine (CODING_GUIDELINES.md)
Files:
**/*.{h,hpp,hh,hxx,cc,cpp,cxx,cu,cuh,py}📄 CodeRabbit inference engine (CODING_GUIDELINES.md)
Files:
**/*.py📄 CodeRabbit inference engine (CODING_GUIDELINES.md)
Files:
**/*.{cpp,cc,cxx,h,hpp,hh,hxx,cu,cuh,py}📄 CodeRabbit inference engine (CODING_GUIDELINES.md)
Files:
🧠 Learnings (2)📓 Common learnings
📚 Learning: 2025-08-09T20:57:04.084Z
Applied to files:
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
🔇 Additional comments (3)
✨ Finishing Touches
🧪 Generate unit tests
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. CodeRabbit Commands (Invoked using PR/Issue comments)Type Other keywords and placeholders
Status, Documentation and Community
|
/bot run --disable-fail-fast |
PR_Github #17491 [ run ] triggered by Bot |
Triton will trigger unexpected recompilation for DG fused_moe _preprocess_after_permute_kernel where token number was defined as constant, which might be treated as specialization by Triton. Replace it as a normal int value will solve the issue. Signed-off-by: Yukun He <[email protected]>
/bot run |
PR_Github #17603 [ run ] triggered by Bot |
PR_Github #17491 [ run ] completed with state |
PR_Github #17603 [ run ] completed with state |
/bot run |
PR_Github #17640 [ run ] triggered by Bot |
PR_Github #17640 [ run ] completed with state |
…n DG fused_moe. (NVIDIA#7495) Signed-off-by: Yukun He <[email protected]>
Triton will trigger unexpected recompilation for DG fused_moe _preprocess_after_permute_kernel, where the token number was defined as a constant, which might be treated as a specialization by Triton. Replacing it with a normal int value will solve the issue.
For concurrency 4096: w/. fixing vs. w/o. fixing = 51029.35 vs. 44774.58 TOPS.
Summary by CodeRabbit
Refactor
Bug Fixes