Skip to content

Conversation

ixlmar
Copy link
Collaborator

@ixlmar ixlmar commented Sep 30, 2025

Reverts #7909

Summary by CodeRabbit

  • Bug Fixes

    • Enforced deterministic generation in evaluation flows by setting temperature to 0, improving consistency of JSON-mode and MMLU results.
  • Tests

    • Updated chat completion tests to include temperature=0 for more reliable and predictable behavior.

@ixlmar ixlmar force-pushed the revert-7909-test/batch-sampling-greedy branch from 097f8ae to 74e95ca Compare September 30, 2025 16:49
Copy link
Contributor

coderabbitai bot commented Sep 30, 2025

📝 Walkthrough

Walkthrough

Introduces explicit temperature=0 in evaluation and test code paths: adds "temperature": 0 to sampling arguments in json_mode_eval.py, updates generate_samples in mmlu.py to yield {"temperature": 0} instead of None, and sets temperature=0 in a specific OpenAI chat completion test.

Changes

Cohort / File(s) Summary of Changes
Evaluation: explicit temperature in sampling
tensorrt_llm/evaluate/json_mode_eval.py, tensorrt_llm/evaluate/mmlu.py
Added {"temperature": 0} to sampling args; mmlu.generate_samples now yields a params dict with temperature=0 instead of None.
Tests: stabilize temperature in OpenAI misc
tests/unittest/llmapi/apps/_test_openai_misc.py
Set temperature=0 in chat completion request within test_request_cancellation; added FIXME comment.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Pre-merge checks and finishing touches

❌ Failed checks (2 warnings)
Check name Status Explanation Resolution
Description Check ⚠️ Warning The PR description only contains a single-line revert statement and omits the required template sections such as the summary header, a “Description” section explaining the issue and solution, a “Test Coverage” section, and confirmation of the PR Checklist. Please expand the description to follow the repository template by adding the summary header, a detailed description of the revert and its rationale, the relevant test coverage information, and the PR Checklist items as outlined in the template.
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (1 passed)
Check name Status Explanation
Title Check ✅ Passed The title follows the required “[JIRA ticket][type] Summary” convention and clearly states that this PR reverts the previous change regarding explicit temperature=0 handling for greedy sampling, which accurately reflects the primary intent of the changeset.
✨ Finishing touches
  • 📝 Generate Docstrings
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment

📜 Recent review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 1560cca and 74e95ca.

📒 Files selected for processing (3)
  • tensorrt_llm/evaluate/json_mode_eval.py (1 hunks)
  • tensorrt_llm/evaluate/mmlu.py (1 hunks)
  • tests/unittest/llmapi/apps/_test_openai_misc.py (1 hunks)
🧰 Additional context used
📓 Path-based instructions (3)
**/*.{h,hpp,hh,hxx,cpp,cxx,cc,cu,cuh,py}

📄 CodeRabbit inference engine (CODING_GUIDELINES.md)

Use only spaces, no tabs; indent with 4 spaces.

Files:

  • tensorrt_llm/evaluate/mmlu.py
  • tensorrt_llm/evaluate/json_mode_eval.py
  • tests/unittest/llmapi/apps/_test_openai_misc.py
**/*.py

📄 CodeRabbit inference engine (CODING_GUIDELINES.md)

**/*.py: Python code must target Python 3.8+.
Indent Python code with 4 spaces; do not use tabs.
Maintain module namespace when importing; prefer 'from package.subpackage import foo' then 'foo.SomeClass()' instead of importing the class directly.
Python filenames should be snake_case (e.g., some_file.py).
Python classes use PascalCase names.
Functions and methods use snake_case names.
Local variables use snake_case; prefix 'k' for variables that start with a number (e.g., k_99th_percentile).
Global variables use upper SNAKE_CASE prefixed with 'G' (e.g., G_MY_GLOBAL).
Constants use upper SNAKE_CASE (e.g., MY_CONSTANT).
Avoid shadowing variables from an outer scope.
Initialize all externally visible members of a class in the constructor.
Prefer docstrings for interfaces that may be used outside a file; comments for in-function or file-local interfaces.
Use Google-style docstrings for classes and functions (Sphinx-parsable).
Document attributes and variables inline so they render under the class/function docstring.
Avoid reflection when a simpler, explicit approach suffices (e.g., avoid dict(**locals()) patterns).
In try/except, catch the most specific exceptions possible.
For duck-typing try/except, keep the try body minimal and use else for the main logic.

Files:

  • tensorrt_llm/evaluate/mmlu.py
  • tensorrt_llm/evaluate/json_mode_eval.py
  • tests/unittest/llmapi/apps/_test_openai_misc.py
**/*.{cpp,cxx,cc,h,hpp,hh,hxx,cu,cuh,py}

📄 CodeRabbit inference engine (CODING_GUIDELINES.md)

Prepend the NVIDIA Apache-2.0 copyright header with current year to the top of all source files (e.g., .cpp, .h, .cu, .py).

Files:

  • tensorrt_llm/evaluate/mmlu.py
  • tensorrt_llm/evaluate/json_mode_eval.py
  • tests/unittest/llmapi/apps/_test_openai_misc.py
🧬 Code graph analysis (2)
tensorrt_llm/evaluate/json_mode_eval.py (1)
tensorrt_llm/sampling_params.py (1)
  • GuidedDecodingParams (15-37)
tests/unittest/llmapi/apps/_test_openai_misc.py (1)
tests/unittest/llmapi/apps/_test_openai_chat.py (2)
  • client (82-83)
  • model_name (21-22)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Pre-commit Check
🔇 Additional comments (3)
tensorrt_llm/evaluate/json_mode_eval.py (1)

66-69: LGTM! Explicit temperature setting restored.

The addition of "temperature": 0 to the sampling arguments ensures deterministic greedy sampling during JSON mode evaluation, which is appropriate for evaluation benchmarks where reproducibility is critical.

tests/unittest/llmapi/apps/_test_openai_misc.py (1)

97-103: Documented workaround for known issue.

The addition of temperature=0 with the accompanying FIXME comment properly documents the workaround for the issue where requests complete too quickly without explicit temperature settings. The bug tracker reference (nvbugs/5513423) is helpful for tracking resolution.

Consider adding a test tracking marker or issue link to ensure this workaround is revisited once the underlying bug is fixed:

@pytest.mark.xfail(reason="Workaround for nvbugs/5513423", strict=False)

or monitor the bug tracker to remove the workaround when the issue is resolved.

tensorrt_llm/evaluate/mmlu.py (1)

222-222: LGTM! Explicit temperature setting for deterministic evaluation.

Changing from None to {"temperature": 0} ensures deterministic greedy sampling for MMLU evaluation, which is critical for reproducible benchmark results. This change is consistent with the pattern established across other evaluation modules in this PR.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🧪 Early access (Sonnet 4.5): enabled

We are currently testing the Sonnet 4.5 model, which is expected to improve code review quality. However, this model may lead to increased noise levels in the review comments. Please disable the early access features if the noise level causes any inconvenience.

Note:

  • Public repositories are always opted into early access features.
  • You can enable or disable early access features from the CodeRabbit UI or by updating the CodeRabbit configuration file.

Comment @coderabbitai help to get the list of available commands and usage tips.

@ixlmar ixlmar changed the title Revert "[TRTLLM-8269][test] do not explicitly pass temperature=0 to select greedy sampling" [TRTLLM-8269][fix] Revert "do not explicitly pass temperature=0 to select greedy sampling" Sep 30, 2025
@ixlmar
Copy link
Collaborator Author

ixlmar commented Sep 30, 2025

/bot run --disable-fail-fast

@tensorrt-cicd
Copy link
Collaborator

PR_Github #20402 [ run ] triggered by Bot

@tensorrt-cicd
Copy link
Collaborator

PR_Github #20402 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #15394 completed with status: 'SUCCESS'

@Tabrizian Tabrizian merged commit ee5ae49 into NVIDIA:main Sep 30, 2025
8 of 9 checks passed
@ixlmar ixlmar deleted the revert-7909-test/batch-sampling-greedy branch October 1, 2025 08:06
@ixlmar
Copy link
Collaborator Author

ixlmar commented Oct 1, 2025

/bot run --only-multi-gpu-test --disable-fail-fast

faradawn pushed a commit to faradawn/TensorRT-LLM that referenced this pull request Oct 2, 2025
…lect greedy sampling" (NVIDIA#8103)

Signed-off-by: ixlmar <[email protected]>
Signed-off-by: Faradawn Yang <[email protected]>
Funatiq pushed a commit to faradawn/TensorRT-LLM that referenced this pull request Oct 3, 2025
…lect greedy sampling" (NVIDIA#8103)

Signed-off-by: ixlmar <[email protected]>
Signed-off-by: Faradawn Yang <[email protected]>
evezhier pushed a commit to evezhier/TensorRT-LLM that referenced this pull request Oct 3, 2025
faradawn pushed a commit to faradawn/TensorRT-LLM that referenced this pull request Oct 3, 2025
…lect greedy sampling" (NVIDIA#8103)

Signed-off-by: ixlmar <[email protected]>
Signed-off-by: Faradawn Yang <[email protected]>
faradawn pushed a commit to faradawn/TensorRT-LLM that referenced this pull request Oct 3, 2025
…lect greedy sampling" (NVIDIA#8103)

Signed-off-by: ixlmar <[email protected]>
Signed-off-by: Faradawn Yang <[email protected]>
faradawn pushed a commit to faradawn/TensorRT-LLM that referenced this pull request Oct 3, 2025
…lect greedy sampling" (NVIDIA#8103)

Signed-off-by: ixlmar <[email protected]>
Signed-off-by: Faradawn Yang <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants