[TRTLLM-8269][fix] Revert "do not explicitly pass temperature=0 to select greedy sampling" #8103

ixlmar · 2025-09-30T16:47:54Z

Reverts #7909

Summary by CodeRabbit

Bug Fixes
- Enforced deterministic generation in evaluation flows by setting temperature to 0, improving consistency of JSON-mode and MMLU results.
Tests
- Updated chat completion tests to include temperature=0 for more reliable and predictable behavior.

…elect gr…" This reverts commit 31a1a5f. Signed-off-by: ixlmar <[email protected]>

coderabbitai · 2025-09-30T16:49:58Z

📝 Walkthrough

Walkthrough

Introduces explicit temperature=0 in evaluation and test code paths: adds "temperature": 0 to sampling arguments in json_mode_eval.py, updates generate_samples in mmlu.py to yield {"temperature": 0} instead of None, and sets temperature=0 in a specific OpenAI chat completion test.

Changes

Cohort / File(s)	Summary of Changes
Evaluation: explicit temperature in sampling `tensorrt_llm/evaluate/json_mode_eval.py`, `tensorrt_llm/evaluate/mmlu.py`	Added {"temperature": 0} to sampling args; mmlu.generate_samples now yields a params dict with temperature=0 instead of None.
Tests: stabilize temperature in OpenAI misc `tests/unittest/llmapi/apps/_test_openai_misc.py`	Set temperature=0 in chat completion request within test_request_cancellation; added FIXME comment.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Pre-merge checks and finishing touches

❌ Failed checks (2 warnings)

Check name	Status	Explanation	Resolution
Description Check	⚠️ Warning	The PR description only contains a single-line revert statement and omits the required template sections such as the summary header, a “Description” section explaining the issue and solution, a “Test Coverage” section, and confirmation of the PR Checklist.	Please expand the description to follow the repository template by adding the summary header, a detailed description of the revert and its rationale, the relevant test coverage information, and the PR Checklist items as outlined in the template.
Docstring Coverage	⚠️ Warning	Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%.	You can run `@coderabbitai generate docstrings` to improve docstring coverage.

✅ Passed checks (1 passed)

Check name	Status	Explanation
Title Check	✅ Passed	The title follows the required “[JIRA ticket][type] Summary” convention and clearly states that this PR reverts the previous change regarding explicit temperature=0 handling for greedy sampling, which accurately reflects the primary intent of the changeset.

✨ Finishing touches

📝 Generate Docstrings

🧪 Generate unit tests

Create PR with unit tests
Post copyable unit tests in a comment

📜 Recent review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 1560cca and 74e95ca.

📒 Files selected for processing (3)

tensorrt_llm/evaluate/json_mode_eval.py (1 hunks)
tensorrt_llm/evaluate/mmlu.py (1 hunks)
tests/unittest/llmapi/apps/_test_openai_misc.py (1 hunks)

🧰 Additional context used

📓 Path-based instructions (3)

**/*.{h,hpp,hh,hxx,cpp,cxx,cc,cu,cuh,py}

📄 CodeRabbit inference engine (CODING_GUIDELINES.md)

Use only spaces, no tabs; indent with 4 spaces.

Files:

tensorrt_llm/evaluate/mmlu.py
tensorrt_llm/evaluate/json_mode_eval.py
tests/unittest/llmapi/apps/_test_openai_misc.py

**/*.py

📄 CodeRabbit inference engine (CODING_GUIDELINES.md)

**/*.py: Python code must target Python 3.8+.
Indent Python code with 4 spaces; do not use tabs.
Maintain module namespace when importing; prefer 'from package.subpackage import foo' then 'foo.SomeClass()' instead of importing the class directly.
Python filenames should be snake_case (e.g., some_file.py).
Python classes use PascalCase names.
Functions and methods use snake_case names.
Local variables use snake_case; prefix 'k' for variables that start with a number (e.g., k_99th_percentile).
Global variables use upper SNAKE_CASE prefixed with 'G' (e.g., G_MY_GLOBAL).
Constants use upper SNAKE_CASE (e.g., MY_CONSTANT).
Avoid shadowing variables from an outer scope.
Initialize all externally visible members of a class in the constructor.
Prefer docstrings for interfaces that may be used outside a file; comments for in-function or file-local interfaces.
Use Google-style docstrings for classes and functions (Sphinx-parsable).
Document attributes and variables inline so they render under the class/function docstring.
Avoid reflection when a simpler, explicit approach suffices (e.g., avoid dict(**locals()) patterns).
In try/except, catch the most specific exceptions possible.
For duck-typing try/except, keep the try body minimal and use else for the main logic.

Files:

tensorrt_llm/evaluate/mmlu.py
tensorrt_llm/evaluate/json_mode_eval.py
tests/unittest/llmapi/apps/_test_openai_misc.py

**/*.{cpp,cxx,cc,h,hpp,hh,hxx,cu,cuh,py}

📄 CodeRabbit inference engine (CODING_GUIDELINES.md)

Prepend the NVIDIA Apache-2.0 copyright header with current year to the top of all source files (e.g., .cpp, .h, .cu, .py).

Files:

tensorrt_llm/evaluate/mmlu.py
tensorrt_llm/evaluate/json_mode_eval.py
tests/unittest/llmapi/apps/_test_openai_misc.py

🧬 Code graph analysis (2)

tensorrt_llm/evaluate/json_mode_eval.py (1)

tensorrt_llm/sampling_params.py (1)

GuidedDecodingParams (15-37)

tests/unittest/llmapi/apps/_test_openai_misc.py (1)

tests/unittest/llmapi/apps/_test_openai_chat.py (2)

client (82-83)

model_name (21-22)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)

GitHub Check: Pre-commit Check

🔇 Additional comments (3)

tensorrt_llm/evaluate/json_mode_eval.py (1)

66-69: LGTM! Explicit temperature setting restored.

The addition of "temperature": 0 to the sampling arguments ensures deterministic greedy sampling during JSON mode evaluation, which is appropriate for evaluation benchmarks where reproducibility is critical.
tests/unittest/llmapi/apps/_test_openai_misc.py (1)
97-103: Documented workaround for known issue.

The addition of temperature=0 with the accompanying FIXME comment properly documents the workaround for the issue where requests complete too quickly without explicit temperature settings. The bug tracker reference (nvbugs/5513423) is helpful for tracking resolution.

Consider adding a test tracking marker or issue link to ensure this workaround is revisited once the underlying bug is fixed:
@pytest.mark.xfail(reason="Workaround for nvbugs/5513423", strict=False)
or monitor the bug tracker to remove the workaround when the issue is resolved.
tensorrt_llm/evaluate/mmlu.py (1)

222-222: LGTM! Explicit temperature setting for deterministic evaluation.

Changing from None to {"temperature": 0} ensures deterministic greedy sampling for MMLU evaluation, which is critical for reproducible benchmark results. This change is consistent with the pattern established across other evaluation modules in this PR.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

🧪 Early access (Sonnet 4.5): enabled

We are currently testing the Sonnet 4.5 model, which is expected to improve code review quality. However, this model may lead to increased noise levels in the review comments. Please disable the early access features if the noise level causes any inconvenience.

Note:

Public repositories are always opted into early access features.
You can enable or disable early access features from the CodeRabbit UI or by updating the CodeRabbit configuration file.

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

ixlmar · 2025-09-30T16:51:09Z

/bot run --disable-fail-fast

tensorrt-cicd · 2025-09-30T16:56:42Z

PR_Github #20402 [ run ] triggered by Bot

tensorrt-cicd · 2025-09-30T20:40:25Z

PR_Github #20402 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #15394 completed with status: 'SUCCESS'

ixlmar · 2025-10-01T08:12:27Z

/bot run --only-multi-gpu-test --disable-fail-fast

…lect greedy sampling" (NVIDIA#8103) Signed-off-by: ixlmar <[email protected]> Signed-off-by: Faradawn Yang <[email protected]>

…lect greedy sampling" (NVIDIA#8103) Signed-off-by: ixlmar <[email protected]>

…lect greedy sampling" (NVIDIA#8103) Signed-off-by: ixlmar <[email protected]> Signed-off-by: Faradawn Yang <[email protected]>

Tabrizian approved these changes Sep 30, 2025

View reviewed changes

Revert "[TRTLLM-8269][test] do not explicitly pass temperature=0 to s…

74e95ca

…elect gr…" This reverts commit 31a1a5f. Signed-off-by: ixlmar <[email protected]>

ixlmar force-pushed the revert-7909-test/batch-sampling-greedy branch from 097f8ae to 74e95ca Compare September 30, 2025 16:49

ixlmar changed the title ~~Revert "[TRTLLM-8269][test] do not explicitly pass temperature=0 to select greedy sampling"~~ [TRTLLM-8269][fix] Revert "do not explicitly pass temperature=0 to select greedy sampling" Sep 30, 2025

ixlmar mentioned this pull request Sep 30, 2025

[TRTLLM-8374][fix] workaround disagg MMLU test failures #8096

Closed

1 task

Tabrizian merged commit ee5ae49 into NVIDIA:main Sep 30, 2025
8 of 9 checks passed

ixlmar deleted the revert-7909-test/batch-sampling-greedy branch October 1, 2025 08:06

ixlmar mentioned this pull request Oct 1, 2025

[TRTLLM-8269][test] do not explicitly pass temperature=0 to select greedy sampling #8110

Merged

1 task

evezhier pushed a commit to evezhier/TensorRT-LLM that referenced this pull request Oct 3, 2025

[TRTLLM-8269][fix] Revert "do not explicitly pass temperature=0 to se…

487e023

…lect greedy sampling" (NVIDIA#8103) Signed-off-by: ixlmar <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[TRTLLM-8269][fix] Revert "do not explicitly pass temperature=0 to select greedy sampling" #8103

[TRTLLM-8269][fix] Revert "do not explicitly pass temperature=0 to select greedy sampling" #8103

Uh oh!

ixlmar commented Sep 30, 2025 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Sep 30, 2025 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

Uh oh!

ixlmar commented Sep 30, 2025

Uh oh!

tensorrt-cicd commented Sep 30, 2025

Uh oh!

tensorrt-cicd commented Sep 30, 2025

Uh oh!

Uh oh!

ixlmar commented Oct 1, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

[TRTLLM-8269][fix] Revert "do not explicitly pass temperature=0 to select greedy sampling" #8103

[TRTLLM-8269][fix] Revert "do not explicitly pass temperature=0 to select greedy sampling" #8103

Uh oh!

Conversation

ixlmar commented Sep 30, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Sep 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Pre-merge checks and finishing touches

Uh oh!

ixlmar commented Sep 30, 2025

Uh oh!

tensorrt-cicd commented Sep 30, 2025

Uh oh!

tensorrt-cicd commented Sep 30, 2025

Uh oh!

Uh oh!

ixlmar commented Oct 1, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

ixlmar commented Sep 30, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Sep 30, 2025 •

edited

Loading