-
Notifications
You must be signed in to change notification settings - Fork 1.8k
[None][fix] : Fix OOM issue when dp padding is enabled #8052
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: peaceh <[email protected]>
/bot run |
📝 WalkthroughWalkthroughRefines chunking logic in fused MoE forward_impl for distributed mode: when DP with padding is enabled and parallel_size > 1, num_chunks is computed from max(all_rank_num_tokens) and world size; otherwise, the original num_rows-based computation remains. No public APIs changed. Changes
Sequence Diagram(s)sequenceDiagram
autonumber
participant Caller
participant FusedMoE as FusedMoE.forward_impl
participant DPState as DP Context
participant Chunker as Chunking Logic
Caller->>FusedMoE: forward_impl(input, ...)
FusedMoE->>DPState: check use_dp, parallel_size, use_dp_padding
alt DP with padding enabled and parallel_size > 1
DPState-->>FusedMoE: all_rank_num_tokens
FusedMoE->>Chunker: compute num_chunks = ceil(max(all_rank_num_tokens)*num_ranks / moe_max_num_tokens)
note right of Chunker: New branch for DP padding
else
FusedMoE->>Chunker: compute num_chunks = ceil(num_rows / moe_max_num_tokens)
note right of Chunker: Original behavior
end
FusedMoE->>FusedMoE: split inputs into chunks
FusedMoE-->>Caller: proceed with existing processing per chunk
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Pre-merge checks and finishing touches❌ Failed checks (2 warnings)
✅ Passed checks (1 passed)
✨ Finishing touches
🧪 Generate unit tests
📜 Recent review detailsConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Pro 📒 Files selected for processing (1)
🧰 Additional context used📓 Path-based instructions (3)**/*.{h,hpp,hh,hxx,cpp,cxx,cc,cu,cuh,py}📄 CodeRabbit inference engine (CODING_GUIDELINES.md)
Files:
**/*.py📄 CodeRabbit inference engine (CODING_GUIDELINES.md)
Files:
**/*.{cpp,cxx,cc,h,hpp,hh,hxx,cu,cuh,py}📄 CodeRabbit inference engine (CODING_GUIDELINES.md)
Files:
🧠 Learnings (2)📓 Common learnings
📚 Learning: 2025-08-09T20:57:04.084Z
Applied to files:
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
🔇 Additional comments (1)
Tip 👮 Agentic pre-merge checks are now available in preview!Pro plan users can now enable pre-merge checks in their settings to enforce checklists before merging PRs.
Please see the documentation for more information. Example: reviews:
pre_merge_checks:
custom_checks:
- name: "Undocumented Breaking Changes"
mode: "warning"
instructions: |
Pass/fail criteria: All breaking changes to public APIs, CLI flags, environment variables, configuration keys, database schemas, or HTTP/GraphQL endpoints must be documented in the "Breaking Change" section of the PR description and in CHANGELOG.md. Exclude purely internal or private changes (e.g., code not exported from package entry points or explicitly marked as internal). Please share your feedback with us on this Discord post. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
PR_Github #20215 [ run ] triggered by Bot |
Signed-off-by: peaceh <[email protected]>
/bot run |
PR_Github #20222 [ run ] triggered by Bot |
PR_Github #20215 [ run ] completed with state |
PR_Github #20222 [ run ] completed with state |
Signed-off-by: peaceh <[email protected]> Signed-off-by: Faradawn Yang <[email protected]>
Signed-off-by: peaceh <[email protected]>
Signed-off-by: peaceh <[email protected]> Signed-off-by: Faradawn Yang <[email protected]>
@coderabbitai summary
Description
When dp padding is enabled, the number of chunks should be re-calculated using padded tokens not the original all_rank_num_tokens. The OOM issue is found on SM120 due to dp padding is only enabled on the platform, see https://github.com/NVIDIA/TensorRT-LLM/pull/7965/files