[None][doc] Tech blog: Combining Guided Decoding and Speculative Decoding: Making CPU and GPU Cooperate Seamlessly #7864

syuoni · 2025-09-19T07:11:24Z

Summary by CodeRabbit

Documentation
- Added a new Tech Blog entry on combining guided decoding with speculative decoding for smoother CPU–GPU cooperation.
- Updated the Tech Blogs list with the new post and link.
- Article covers high-level design, data flows, masking approach, CUDA graph integration, concurrency safeguards, and rollback mechanics.
- Includes performance highlights with observed speedups in benchmark scenarios and comparisons across model setups.
- Provides acknowledgements and references for further reading.

Description

Test Coverage

PR Checklist

Please review the following before submitting your PR:

PR description clearly explains what and why. If using CodeRabbit's summary, please make sure it makes sense.
PR Follows TRT-LLM CODING GUIDELINES to the best of your knowledge.
Test cases are provided for new code paths (see test instructions)
Any new dependencies have been scanned for license and vulnerabilities
CODEOWNERS updated if ownership changes
Documentation updated as needed
The reviewers assigned automatically/manually are appropriate for the PR.
Please check this after reviewing the above items as appropriate for this PR.

Currently, the image links are relative links for review purpose.

Update the image links to use absolute links before merging PR.

GitHub Bot Help

/bot [-h] ['run', 'kill', 'skip', 'reuse-pipeline'] ...

Provide a user friendly way for developers to interact with a Jenkins server.

Run /bot [-h|--help] to print this help message.

See details below for each supported subcommand.

run [--reuse-test (optional)pipeline-id --disable-fail-fast --skip-test --stage-list "A10-PyTorch-1, xxx" --gpu-type "A30, H100_PCIe" --test-backend "pytorch, cpp" --add-multi-gpu-test --only-multi-gpu-test --disable-multi-gpu-test --post-merge --extra-stage "H100_PCIe-TensorRT-Post-Merge-1, xxx" --detailed-log --debug(experimental)]

Launch build/test pipelines. All previously running jobs will be killed.

--reuse-test (optional)pipeline-id (OPTIONAL) : Allow the new pipeline to reuse build artifacts and skip successful test stages from a specified pipeline or the last pipeline if no pipeline-id is indicated. If the Git commit ID has changed, this option will be always ignored. The DEFAULT behavior of the bot is to reuse build artifacts and successful test results from the last pipeline.

--disable-reuse-test (OPTIONAL) : Explicitly prevent the pipeline from reusing build artifacts and skipping successful test stages from a previous pipeline. Ensure that all builds and tests are run regardless of previous successes.

--disable-fail-fast (OPTIONAL) : Disable fail fast on build/tests/infra failures.

--skip-test (OPTIONAL) : Skip all test stages, but still run build stages, package stages and sanity check stages. Note: Does NOT update GitHub check status.

--stage-list "A10-PyTorch-1, xxx" (OPTIONAL) : Only run the specified test stages. Examples: "A10-PyTorch-1, xxx". Note: Does NOT update GitHub check status.

--gpu-type "A30, H100_PCIe" (OPTIONAL) : Only run the test stages on the specified GPU types. Examples: "A30, H100_PCIe". Note: Does NOT update GitHub check status.

--test-backend "pytorch, cpp" (OPTIONAL) : Skip test stages which don't match the specified backends. Only support [pytorch, cpp, tensorrt, triton]. Examples: "pytorch, cpp" (does not run test stages with tensorrt or triton backend). Note: Does NOT update GitHub pipeline status.

--only-multi-gpu-test (OPTIONAL) : Only run the multi-GPU tests. Note: Does NOT update GitHub check status.

--disable-multi-gpu-test (OPTIONAL) : Disable the multi-GPU tests. Note: Does NOT update GitHub check status.

--add-multi-gpu-test (OPTIONAL) : Force run the multi-GPU tests in addition to running L0 pre-merge pipeline.

--post-merge (OPTIONAL) : Run the L0 post-merge pipeline instead of the ordinary L0 pre-merge pipeline.

--extra-stage "H100_PCIe-TensorRT-Post-Merge-1, xxx" (OPTIONAL) : Run the ordinary L0 pre-merge pipeline and specified test stages. Examples: --extra-stage "H100_PCIe-TensorRT-Post-Merge-1, xxx".

--detailed-log (OPTIONAL) : Enable flushing out all logs to the Jenkins console. This will significantly increase the log volume and may slow down the job.

--debug (OPTIONAL) : Experimental feature. Enable access to the CI container for debugging purpose. Note: Specify exactly one stage in the stage-list parameter to access the appropriate container environment. Note: Does NOT update GitHub check status.

For guidance on mapping tests to stage names, see docs/source/reference/ci-overview.md
and the scripts/test_to_stage_mapping.py helper.

kill

kill

Kill all running builds associated with pull request.

skip

skip --comment COMMENT

Skip testing for latest commit on pull request. --comment "Reason for skipping build/test" is required. IMPORTANT NOTE: This is dangerous since lack of user care and validation can cause top of tree to break.

reuse-pipeline

reuse-pipeline

Reuse a previous pipeline to validate current commit. This action will also kill all currently running builds associated with the pull request. IMPORTANT NOTE: This is dangerous since lack of user care and validation can cause top of tree to break.

Signed-off-by: Enwei Zhu <[email protected]>

coderabbitai · 2025-09-19T07:16:58Z

📝 Walkthrough

Walkthrough

Adds a new Tech Blog entry to README and introduces a detailed documentation article on combining guided decoding with speculative decoding, covering design, data flow, CUDA Graph capturability via host callbacks, masking, state management, and benchmarking. No code or public API changes.

Changes

Cohort / File(s)	Summary of Changes
README Tech Blog Index `README.md`	Inserts a new Tech Blog list item for 09/18 with a link to the new article; positioned before the 08/29 entry.
Tech Blog Article `docs/source/blogs/tech_blog/blog12_Combining_Guided_Decoding_and_Speculative_Decoding.md`	Adds a new blog post detailing guided decoding + speculative decoding integration, grammar/mask flow, one- vs two-model drafting, vocab mapping, CUDA Graph host callbacks, masking kernel assumptions, concurrency considerations, and performance results.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  actor User
  participant Executor as Decoder Executor
  participant GPU as Model Forward (GPU)
  participant CPU as Grammar Engine (CPU)
  rect rgb(238,245,255)
    note over Executor,GPU: Speculative Decoding Loop
    User->>Executor: Start request
    Executor->>GPU: Draft tokens (1 or 2-model)
    par Overlap
      GPU-->>Executor: Draft logits
      Executor->>CPU: Compute grammar mask for draft/target
      CPU-->>Executor: Token mask(s), updated grammar state
    end
    Executor->>Executor: Apply mask to logits (disallow tokens)
    alt Draft verified
      Executor->>Executor: Accept draft token(s), advance grammar state
    else Draft rejected
      Executor->>Executor: Roll back draft, reuse valid prefix
    end
    Executor->>GPU: Target step with masked logits
    GPU-->>Executor: Next token
    Executor->>CPU: Advance grammar with accepted token
  end
  Executor-->>User: Streamed tokens

sequenceDiagram
  autonumber
  participant Exec as Executor
  participant CUDAGraph as CUDA Graph
  participant HostCB as cudaLaunchHostFunc Callback
  participant Py as Python HostFunc (capturable)
  participant CPU as Grammar Engine
  participant GPU as Model Stream

  note over CUDAGraph,HostCB: Capturable guided decoding
  Exec->>CUDAGraph: Capture graph (fixed buffers, slots)
  CUDAGraph->>GPU: Enqueue model kernels
  CUDAGraph->>HostCB: Schedule host callback
  HostCB->>Py: Invoke hostfunc (GIL-released)
  Py->>CPU: Compute/restore grammar state, build masks
  CPU-->>Py: Masks + state snapshot
  Py-->>HostCB: Write masks into fixed buffers
  HostCB-->>CUDAGraph: Callback done
  CUDAGraph-->>Exec: Replay complete (overlapped streams)

Estimated code review effort

🎯 1 (Trivial) | ⏱️ ~3 minutes

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Description Check	⚠️ Warning	The PR uses the repository's PR template but leaves required sections unfilled: the Description and Test Coverage fields are empty and the body contains template placeholders (including the @coderabbitai marker), so reviewers lack a concise "what/why" and verification guidance. Because the template explicitly requires a short explanation and tests (or a doc-only justification), the current description is incomplete for sign-off. The checked notes about image links do not replace the missing Description and Test Coverage information.	Please update the PR body to provide a clear Description summarizing the change and rationale, complete the Test Coverage section (or state "doc-only; no tests required"), and ensure the PR title matches the repository naming convention; also convert any relative image links to absolute URLs before merging. After updating, re-request review and confirm any CODEOWNERS or CI considerations mentioned in the template.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Title Check	✅ Passed	The PR title "[None][doc] Tech blog: Combining Guided Decoding and Speculative Decoding: Making CPU and GPU Cooperate Seamlessly" clearly and specifically summarizes the principal change (adding a tech blog entry), follows the repository's ticket/type prefix convention, and is readable for reviewers scanning history.
Docstring Coverage	✅ Passed	No functions found in the changes. Docstring coverage check skipped.

✨ Finishing touches

📝 Generate Docstrings

🧪 Generate unit tests

Create PR with unit tests
Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (8)

docs/source/blogs/tech_blog/blog12_Combining_Guided_Decoding_and_Speculative_Decoding.md (8)
3-3: Avoid “emphasis as heading” (markdownlint MD036) for the author line.

Use plain text or a small “Authors:” prefix instead of italic-only.
-*By NVIDIA TensorRT LLM Team and XGrammar Team*
+Authors: NVIDIA TensorRT LLM Team and XGrammar Team
21-23: Fix invalid ToC anchors (markdownlint MD051).

Section slugs use “troubleshooting”, not “trouble-shooting”.
-    - [Troubleshooting: Data Race between Host and CUDA Callback](#trouble-shooting-data-race-between-host-and-cuda-callback)
-    - [Troubleshooting: Deadlock by GIL and CUDA Mutex](#trouble-shooting-deadlock-by-gil-and-cuda-mutex)
+    - [Troubleshooting: Data Race between Host and CUDA Callback](#troubleshooting-data-race-between-host-and-cuda-callback)
+    - [Troubleshooting: Deadlock by GIL and CUDA Mutex](#troubleshooting-deadlock-by-gil-and-cuda-mutex)
50-56: Add alt text to Figure 1 image (markdownlint MD045).
-  <img src="https://github.com/media/tech_blog12_constrained_decoding_pipeline_overlap.png" width="600">
+  <img src="https://github.com/media/tech_blog12_constrained_decoding_pipeline_overlap.png" width="600" alt="Guided decoding timelines with and without CPU/GPU overlap">
63-69: Add alt text to Figure 2 image (markdownlint MD045).
-  <img src="https://github.com/media/tech_blog12_one_model_vs_two_model.png" width="600">
+  <img src="https://github.com/media/tech_blog12_one_model_vs_two_model.png" width="600" alt="GPU timelines: one-model vs two-model speculative decoding">
140-146: Add alt text to Figure 4 image (markdownlint MD045).
-  <img src="https://github.com/media/tech_blog12_cpu_gpu_synchronization_for_multiple_steps_by_cuda_callback.png" width="800">
+  <img src="https://github.com/media/tech_blog12_cpu_gpu_synchronization_for_multiple_steps_by_cuda_callback.png" width="800" alt="CPU-GPU synchronization across multiple steps via CUDA callbacks">
262-266: Add alt text to Figures 5–8 (markdownlint MD045).
-  <img src="https://github.com/media/tech_blog12_pareto_curve_json_mode_eval_llama_3.1_8b.png" width="600">
+  <img src="https://github.com/media/tech_blog12_pareto_curve_json_mode_eval_llama_3.1_8b.png" width="600" alt="Pareto curve: JSON Mode Eval, LLaMA 3.1 8B on H200">
-  <img src="https://github.com/media/tech_blog12_pareto_curve_json_mode_eval_llama_3.3_70b.png" width="600">
+  <img src="https://github.com/media/tech_blog12_pareto_curve_json_mode_eval_llama_3.3_70b.png" width="600" alt="Pareto curve: JSON Mode Eval, LLaMA 3.3 70B on H200">
-  <img src="https://github.com/media/tech_blog12_pareto_curve_json_schema_bench_llama_3.1_8b.png" width="600">
+  <img src="https://github.com/media/tech_blog12_pareto_curve_json_schema_bench_llama_3.1_8b.png" width="600" alt="Pareto curve: JSON Schema Bench, LLaMA 3.1 8B on H200">
-  <img src="https://github.com/media/tech_blog12_pareto_curve_json_schema_bench_llama_3.3_70b.png" width="600">
+  <img src="https://github.com/media/tech_blog12_pareto_curve_json_schema_bench_llama_3.3_70b.png" width="600" alt="Pareto curve: JSON Schema Bench, LLaMA 3.3 70B on H200">
Also applies to: 271-274, 283-285, 290-292

32-37: Unordered list marker style differs from repo lint expectations (markdownlint MD004).

Switch “*” to “-” in lists or adjust markdownlint config for this doc. Given the linter output, prefer dashes.

Also applies to: 74-79, 81-82, 123-137, 198-201, 221-223, 229-231, 308-309

151-155: Add commit SHA to versioned deep links for reproducibility

Links point to NVIDIA/TensorRT-LLM v1.1.0rc5 — include the exact commit SHA used for the benchmarks to avoid future drift.

Verification: referenced image assets are present; "Troubleshooting" anchors found at lines 21–22.
Applies to lines 151–155 and 190–206.

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 451475e and ce64c8f.

⛔ Files ignored due to path filters (8)

docs/source/blogs/media/tech_blog12_constrained_decoding_pipeline_overlap.png is excluded by !**/*.png
docs/source/blogs/media/tech_blog12_cpu_gpu_synchronization_for_multiple_steps.png is excluded by !**/*.png
docs/source/blogs/media/tech_blog12_cpu_gpu_synchronization_for_multiple_steps_by_cuda_callback.png is excluded by !**/*.png
docs/source/blogs/media/tech_blog12_one_model_vs_two_model.png is excluded by !**/*.png
docs/source/blogs/media/tech_blog12_pareto_curve_json_mode_eval_llama_3.1_8b.png is excluded by !**/*.png
docs/source/blogs/media/tech_blog12_pareto_curve_json_mode_eval_llama_3.3_70b.png is excluded by !**/*.png
docs/source/blogs/media/tech_blog12_pareto_curve_json_schema_bench_llama_3.1_8b.png is excluded by !**/*.png
docs/source/blogs/media/tech_blog12_pareto_curve_json_schema_bench_llama_3.3_70b.png is excluded by !**/*.png

📒 Files selected for processing (2)

README.md (1 hunks)
docs/source/blogs/tech_blog/blog12_Combining_Guided_Decoding_and_Speculative_Decoding.md (1 hunks)

🧰 Additional context used

🪛 markdownlint-cli2 (0.17.2)

docs/source/blogs/tech_blog/blog12_Combining_Guided_Decoding_and_Speculative_Decoding.md

3-3: Emphasis used instead of a heading

(MD036, no-emphasis-as-heading)

21-21: Link fragments should be valid

(MD051, link-fragments)

22-22: Link fragments should be valid

(MD051, link-fragments)

32-32: Unordered list style
Expected: dash; Actual: asterisk

(MD004, ul-style)

33-33: Unordered list style
Expected: dash; Actual: asterisk

(MD004, ul-style)

34-34: Unordered list style
Expected: dash; Actual: asterisk

(MD004, ul-style)

35-35: Unordered list style
Expected: dash; Actual: asterisk

(MD004, ul-style)

36-36: Unordered list style
Expected: dash; Actual: asterisk

(MD004, ul-style)

52-52: Images should have alternate text (alt text)

(MD045, no-alt-text)

65-65: Images should have alternate text (alt text)

(MD045, no-alt-text)

74-74: Unordered list style
Expected: dash; Actual: asterisk

(MD004, ul-style)

75-75: Unordered list style
Expected: dash; Actual: asterisk

(MD004, ul-style)

76-76: Unordered list style
Expected: dash; Actual: asterisk

(MD004, ul-style)

77-77: Unordered list style
Expected: dash; Actual: asterisk

(MD004, ul-style)

81-81: Unordered list style
Expected: dash; Actual: asterisk

(MD004, ul-style)

82-82: Unordered list style
Expected: dash; Actual: asterisk

(MD004, ul-style)

86-86: Images should have alternate text (alt text)

(MD045, no-alt-text)

123-123: Unordered list style
Expected: dash; Actual: asterisk

(MD004, ul-style)

124-124: Unordered list style
Expected: dash; Actual: asterisk

(MD004, ul-style)

128-128: Unordered list style
Expected: dash; Actual: asterisk

(MD004, ul-style)

129-129: Unordered list style
Expected: dash; Actual: asterisk

(MD004, ul-style)

130-130: Unordered list style
Expected: dash; Actual: asterisk

(MD004, ul-style)

131-131: Unordered list style
Expected: dash; Actual: asterisk

(MD004, ul-style)

132-132: Unordered list style
Expected: dash; Actual: asterisk

(MD004, ul-style)

133-133: Unordered list style
Expected: dash; Actual: asterisk

(MD004, ul-style)

134-134: Unordered list style
Expected: dash; Actual: asterisk

(MD004, ul-style)

135-135: Unordered list style
Expected: dash; Actual: asterisk

(MD004, ul-style)

136-136: Unordered list style
Expected: dash; Actual: asterisk

(MD004, ul-style)

137-137: Unordered list style
Expected: dash; Actual: asterisk

(MD004, ul-style)

138-138: Unordered list style
Expected: dash; Actual: asterisk

(MD004, ul-style)

142-142: Images should have alternate text (alt text)

(MD045, no-alt-text)

198-198: Unordered list style
Expected: dash; Actual: asterisk

(MD004, ul-style)

199-199: Unordered list style
Expected: dash; Actual: asterisk

(MD004, ul-style)

200-200: Unordered list style
Expected: dash; Actual: asterisk

(MD004, ul-style)

201-201: Unordered list style
Expected: dash; Actual: asterisk

(MD004, ul-style)

221-221: Unordered list style
Expected: dash; Actual: asterisk

(MD004, ul-style)

222-222: Unordered list style
Expected: dash; Actual: asterisk

(MD004, ul-style)

223-223: Unordered list style
Expected: dash; Actual: asterisk

(MD004, ul-style)

229-229: Unordered list style
Expected: dash; Actual: asterisk

(MD004, ul-style)

230-230: Unordered list style
Expected: dash; Actual: asterisk

(MD004, ul-style)

231-231: Unordered list style
Expected: dash; Actual: asterisk

(MD004, ul-style)

264-264: Images should have alternate text (alt text)

(MD045, no-alt-text)

272-272: Images should have alternate text (alt text)

(MD045, no-alt-text)

283-283: Images should have alternate text (alt text)

(MD045, no-alt-text)

290-290: Images should have alternate text (alt text)

(MD045, no-alt-text)

308-308: Unordered list style
Expected: dash; Actual: asterisk

(MD004, ul-style)

309-309: Unordered list style
Expected: dash; Actual: asterisk

(MD004, ul-style)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)

GitHub Check: Pre-commit Check

🔇 Additional comments (1)

README.md (1)

21-22: LGTM — blog file present; verify built docs link.
File exists: docs/source/blogs/tech_blog/blog12_Combining_Guided_Decoding_and_Speculative_Decoding.md. Verify the link resolves after building the docs.

Signed-off-by: Enwei Zhu <[email protected]>

docs/source/blogs/tech_blog/blog12_Combining_Guided_Decoding_and_Speculative_Decoding.md

Signed-off-by: Enwei Zhu <[email protected]>

syuoni · 2025-09-19T10:15:01Z

/bot skip --comment "doc changes only"

tensorrt-cicd · 2025-09-19T10:20:35Z

PR_Github #19346 [ skip ] triggered by Bot

tensorrt-cicd · 2025-09-19T10:38:01Z

PR_Github #19346 [ skip ] completed with state SUCCESS
Skipping testing for commit a212b9e

…ding: Making CPU and GPU Cooperate Seamlessly (NVIDIA#7864) Signed-off-by: Enwei Zhu <[email protected]>

syuoni added 13 commits September 19, 2025 07:05

part 1

19095e2

Signed-off-by: Enwei Zhu <[email protected]>

part 2: data race

427ca1f

Signed-off-by: Enwei Zhu <[email protected]>

part 3: deadlock

0f01a12

Signed-off-by: Enwei Zhu <[email protected]>

part 4

c60294a

Signed-off-by: Enwei Zhu <[email protected]>

polish

d04b1d2

Signed-off-by: Enwei Zhu <[email protected]>

expose

f9b72bf

Signed-off-by: Enwei Zhu <[email protected]>

polish

a75459a

Signed-off-by: Enwei Zhu <[email protected]>

polish

73c68c4

Signed-off-by: Enwei Zhu <[email protected]>

update

f436ac5

Signed-off-by: Enwei Zhu <[email protected]>

fix

f373da4

Signed-off-by: Enwei Zhu <[email protected]>

fix 2

50a2d98

Signed-off-by: Enwei Zhu <[email protected]>

fix 3

3f0d568

Signed-off-by: Enwei Zhu <[email protected]>

polish

ce64c8f

Signed-off-by: Enwei Zhu <[email protected]>

syuoni requested review from QiJune, juney-nvidia, lowsfer, mikeiovine, nvbrantz and yweng0828 September 19, 2025 07:11

syuoni self-assigned this Sep 19, 2025

syuoni requested a review from a team as a code owner September 19, 2025 07:11

coderabbitai bot reviewed Sep 19, 2025

View reviewed changes

syuoni added 2 commits September 19, 2025 07:21

fix

9e3cd3f

Signed-off-by: Enwei Zhu <[email protected]>

fix

93b7eaf

Signed-off-by: Enwei Zhu <[email protected]>

yweng0828 approved these changes Sep 19, 2025

View reviewed changes

syuoni added 3 commits September 19, 2025 09:54

fix 4

e0c90e7

Signed-off-by: Enwei Zhu <[email protected]>

fix 5

7e10f09

Signed-off-by: Enwei Zhu <[email protected]>

update

a212b9e

Signed-off-by: Enwei Zhu <[email protected]>

juney-nvidia approved these changes Sep 19, 2025

View reviewed changes

juney-nvidia merged commit c8cc16d into NVIDIA:main Sep 19, 2025
6 of 7 checks passed

Wong4j pushed a commit to Wong4j/TensorRT-LLM that referenced this pull request Sep 20, 2025

[None][doc] Tech blog: Combining Guided Decoding and Speculative Deco…

e13ed45

…ding: Making CPU and GPU Cooperate Seamlessly (NVIDIA#7864) Signed-off-by: Enwei Zhu <[email protected]>

MrGeva pushed a commit to nv-auto-deploy/TensorRT-LLM that referenced this pull request Sep 21, 2025

[None][doc] Tech blog: Combining Guided Decoding and Speculative Deco…

a12eedd

…ding: Making CPU and GPU Cooperate Seamlessly (NVIDIA#7864) Signed-off-by: Enwei Zhu <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[None][doc] Tech blog: Combining Guided Decoding and Speculative Decoding: Making CPU and GPU Cooperate Seamlessly #7864

[None][doc] Tech blog: Combining Guided Decoding and Speculative Decoding: Making CPU and GPU Cooperate Seamlessly #7864

Uh oh!

syuoni commented Sep 19, 2025 •

edited

Loading

Uh oh!

coderabbitai bot commented Sep 19, 2025 •

edited

Loading

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

syuoni commented Sep 19, 2025

Uh oh!

tensorrt-cicd commented Sep 19, 2025

Uh oh!

tensorrt-cicd commented Sep 19, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

[None][doc] Tech blog: Combining Guided Decoding and Speculative Decoding: Making CPU and GPU Cooperate Seamlessly #7864

[None][doc] Tech blog: Combining Guided Decoding and Speculative Decoding: Making CPU and GPU Cooperate Seamlessly #7864

Uh oh!

Conversation

syuoni commented Sep 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Description

Test Coverage

PR Checklist

GitHub Bot Help

kill

skip

reuse-pipeline

Uh oh!

coderabbitai bot commented Sep 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Pre-merge checks and finishing touches

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

syuoni commented Sep 19, 2025

Uh oh!

tensorrt-cicd commented Sep 19, 2025

Uh oh!

tensorrt-cicd commented Sep 19, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

syuoni commented Sep 19, 2025 •

edited

Loading

coderabbitai bot commented Sep 19, 2025 •

edited

Loading