[BUGFIX] Mtp torchair pd fix #3449

JC-ut0 · 2025-10-14T09:38:58Z

What this PR does / why we need it?

In memory of #2610
In the pd Disaggregation scenario, the first token of the inference after the d node receives the kv follows the eager mode.

Fixes:
Running with MTP torchair graph mode with Prefilling Decoding Disaggregation , if all requests processed by the D node are requests just transmitted from the P node, it will break the torchair graph.

Reason: During PD Disaggregation , the P node only transmits the KV cache and prompt to the D node, not the actual tokens inferred (neither the main model tokens nor the MTP tokens are transmitted). Therefore, the D node will treat this request as one without MTP tokens for inference (seq_len=1).
The community does not have graph mode issues because the community's attention has a seq_len=1 for each batch during the decode phase.
We have issues because the graph mode pads according to processing 2 tokens per request. When there are some seq_len=1 and some seq_len=2, padding is done at the end. If all requests received by the D node are seq_len=1, padding cannot be performed normally according to the attention's fia operator constraints.

Solution:

The kv consumer uses extra torchair graph padding to avoid breaking FIA graph constrains (The one this PR implemented).

The kv producer provides the correct tokens to the kv consumer, so that our graph mode constraints are not broken, and all logic is the same as the PD mixed deployment . Since we are using the community scheduler, the modification requires patching the vllm scheduler, but theoretically, performance should be better. (Maybe later )

Does this PR introduce any user-facing change?

How was this patch tested?

vLLM version: v0.11.0rc3
vLLM main: https://github.com/vllm-project/vllm/commit/v0.11.0

github-actions · 2025-10-14T09:39:13Z

👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:‌‌

A PR should do only one thing, smaller PRs enable faster reviews.
Every PR should include unit tests and end-to-end tests ‌to ensure it works and is not broken by other future PRs.
Write the commit message by fulfilling the PR description to help reviewer and future developers understand.

If CI fails, you can run linting and testing checks locally according Contributing and Testing.

gemini-code-assist

Code Review

This pull request introduces a fix for an MTP torchair graph mode issue in prefilling-decoding disaggregation scenarios. The change involves adding extra padding to the torchair graph batch sizes for KV consumers to avoid breaking FIA graph constraints. However, the implementation has a logical flaw where self.torchair_graph_batch_sizes is modified and then reused, leading to incorrect calculations. I've suggested a refactoring to correct this logic.

Signed-off-by: xuyexiong <[email protected]>

linfeng-yuan

LGTM. Please check the logic of max_num_seq updating here.

Currently, deepseek can also performed with aclgraph in full graph mode, I think you can submit another PR to update the padding size of mtp for aclgraph also (full graph only).

JC-ut0 force-pushed the mtp_torchair_pd_fix branch from 7e2a779 to 23231fe Compare October 14, 2025 09:39

gemini-code-assist bot reviewed Oct 14, 2025

View reviewed changes

JC-ut0 force-pushed the mtp_torchair_pd_fix branch from 23231fe to c0aab7f Compare October 14, 2025 09:54

MengqingCao added ready read for review ready-for-test start test by label for PR labels Oct 14, 2025

JC-ut0 force-pushed the mtp_torchair_pd_fix branch from c0aab7f to e8f85eb Compare October 15, 2025 06:13

FIX mtp torchair PD bug

cc90849

Signed-off-by: xuyexiong <[email protected]>

JC-ut0 force-pushed the mtp_torchair_pd_fix branch from e8f85eb to cc90849 Compare October 15, 2025 06:23

linfeng-yuan reviewed Oct 15, 2025

View reviewed changes

linfeng-yuan approved these changes Oct 15, 2025

View reviewed changes

yiz-liu approved these changes Oct 16, 2025

View reviewed changes

yiz-liu merged commit b0ae203 into vllm-project:main Oct 16, 2025
17 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[BUGFIX] Mtp torchair pd fix #3449

[BUGFIX] Mtp torchair pd fix #3449

JC-ut0 commented Oct 14, 2025 •

edited

Loading

Uh oh!

github-actions bot commented Oct 14, 2025

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

linfeng-yuan left a comment •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

[BUGFIX] Mtp torchair pd fix #3449

[BUGFIX] Mtp torchair pd fix #3449

Conversation

JC-ut0 commented Oct 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What this PR does / why we need it?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

github-actions bot commented Oct 14, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

linfeng-yuan left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

JC-ut0 commented Oct 14, 2025 •

edited

Loading

linfeng-yuan left a comment •

edited

Loading