[TRTLLM-1302][feat] Topk logprobs for TRT backend and top1 logprob for PyT backend #6097

LinPoly · 2025-07-16T13:48:54Z

Description

TopK logprobs for TRT backend with gathering context & generation logits
Top1 logprobs for PyT backend
For now some length mismatch happens randomly, still need to investigate.

Test Coverage

Will be added.

GitHub Bot Help

/bot [-h] ['run', 'kill', 'skip', 'reuse-pipeline'] ...

Provide a user friendly way for developers to interact with a Jenkins server.

Run /bot [-h|--help] to print this help message.

See details below for each supported subcommand.

run [--disable-fail-fast --skip-test --stage-list "A10-1, xxx" --gpu-type "A30, H100_PCIe" --add-multi-gpu-test --only-multi-gpu-test --disable-multi-gpu-test --post-merge --extra-stage "H100_PCIe-[Post-Merge]-1, xxx"]

Launch build/test pipelines. All previously running jobs will be killed.

--disable-fail-fast (OPTIONAL) : Disable fail fast on build/tests/infra failures.

--skip-test (OPTIONAL) : Skip all test stages, but still run build stages, package stages and sanity check stages. Note: Does NOT update GitHub check status.

--stage-list "A10-1, xxx" (OPTIONAL) : Only run the specified test stages. Examples: "A10-1, xxx". Note: Does NOT update GitHub check status.

--gpu-type "A30, H100_PCIe" (OPTIONAL) : Only run the test stages on the specified GPU types. Examples: "A30, H100_PCIe". Note: Does NOT update GitHub check status.

--only-multi-gpu-test (OPTIONAL) : Only run the multi-GPU tests. Note: Does NOT update GitHub check status.

--disable-multi-gpu-test (OPTIONAL) : Disable the multi-GPU tests. Note: Does NOT update GitHub check status.

--add-multi-gpu-test (OPTIONAL) : Force run the multi-GPU tests. Will also run L0 pre-merge pipeline.

--post-merge (OPTIONAL) : Run the L0 post-merge pipeline instead of the ordinary L0 pre-merge pipeline.

--extra-stage "H100_PCIe-[Post-Merge]-1, xxx" (OPTIONAL) : Run the ordinary L0 pre-merge pipeline and specified test stages. Examples: --extra-stage "H100_PCIe-[Post-Merge]-1, xxx".

For guidance on mapping tests to stage names, see docs/source/reference/ci-overview.md.

kill

kill

Kill all running builds associated with pull request.

skip

skip --comment COMMENT

Skip testing for latest commit on pull request. --comment "Reason for skipping build/test" is required. IMPORTANT NOTE: This is dangerous since lack of user care and validation can cause top of tree to break.

reuse-pipeline

reuse-pipeline

Reuse a previous pipeline to validate current commit. This action will also kill all currently running builds associated with the pull request. IMPORTANT NOTE: This is dangerous since lack of user care and validation can cause top of tree to break.

Summary by CodeRabbit

New Features
- Added support for returning top-k log probabilities per token in chat completions, enabling detailed token probability insights.
- Enhanced API to optionally include top log probabilities in chat completion responses.
- Extended request parameters to better control logprobs gathering and backend-specific behavior.
Bug Fixes
- Improved validation and handling of logprobs and top_logprobs parameters in chat completion requests.
- Refined internal logprob state management for more accurate probability tracking.
Tests
- Added new integration and unit tests validating top-k logprobs functionality on PyTorch and TensorRT backends.
- Consolidated and updated existing tests for logprobs and chat completions, removing redundant streaming tests.
Documentation
- Updated parameter validation logic and API behavior for logprobs and top_logprobs fields in chat completion requests.

coderabbitai · 2025-07-23T14:38:07Z

📝 Walkthrough

Walkthrough

This change introduces support for top-k log probabilities in chat completions, updating request and response schemas, validation logic, and postprocessing to handle and return detailed per-token logprobs. It also adds new integration and unit tests for this feature, adjusts type annotations, modifies related sampling parameter logic and server invocation, and removes some legacy and streaming tests.

Changes

File(s)	Change Summary
Executor module cleanup `tensorrt_llm/executor/postproc_worker.py`	Removed unused import of `zmq.asyncio`.
Result handling and type updates `tensorrt_llm/executor/result.py`	Expanded type annotations for `logprobs` and `logprobs_diff` fields to allow `TokenLogprobs` or `List[float]`; modified `_handle_sequence` to append or assign logprobs conditionally and update internal state tracking.
OpenAI protocol schema and validation `tensorrt_llm/serve/openai_protocol.py`	Changed `logprobs` and `top_logprobs` field types and defaults in request schemas; updated validation logic to allow zero or positive `top_logprobs` and enforce consistency; modified `to_sampling_params` to handle new logprobs logic with backend-specific behavior; changed `max_completion_tokens` to optional.
Server invocation update `tensorrt_llm/serve/openai_server.py`	Updated call to `request.to_sampling_params()` to pass `gather_generation_logits` and `backend` flags from LLM configuration.
Postprocessing enhancements `tensorrt_llm/serve/postprocess_handlers.py`	Added support for top-k logprobs in postprocessing: new `top_logprobs` flag in `ChatPostprocArgs`, updated `create_logprobs` to handle `logprobs` as list or dict with top logprobs details, and adjusted postprocessor functions to pass the flag.
Integration test addition `tests/integration/defs/test_e2e.py`	Added new parameterized test `test_trtllm_serve_top_logprobs` for top-k logprobs serving, running pytest on a dedicated test script with backend filtering.
Integration test list update `tests/integration/test_lists/test-db/l0_a10.yml`	Registered new top-k logprobs test cases for both PyTorch and TRT backends in the integration test list.
Unit test refactoring and removals `tests/unittest/llmapi/apps/_test_openai_chat.py`	Consolidated logprobs testing into the main chat session test; renamed a test for multiple responses; removed async streaming chat completion tests with and without logprobs.
New unit tests for top-k logprobs `tests/unittest/llmapi/apps/_test_trtllm_serve_top_logprobs.py`	Added new async test module validating top-k logprobs in chat completions, with fixtures for model, backend, config, server, and client; includes tests for both top-5 and top-1 logprobs per backend.
API stability reference update `tests/unittest/api_stability/references/completion_output.yaml`	Expanded `logprobs_diff` property type annotation to union of list of dicts or list of floats.
API stability committed reference update `tests/unittest/api_stability/references_committed/completion_output.yaml`	Expanded `logprobs` parameter type annotation in `__init__` to allow list of dicts or list of floats.

Sequence Diagram(s)

sequenceDiagram
    participant Client
    participant OpenAIServer
    participant ChatCompletionRequest
    participant Postprocessor
    participant ModelBackend

    Client->>OpenAIServer: POST /chat/completions (logprobs, top_logprobs)
    OpenAIServer->>ChatCompletionRequest: Parse and validate request
    ChatCompletionRequest->>OpenAIServer: to_sampling_params(gather_generation_logits, backend)
    OpenAIServer->>ModelBackend: Generate completion (with sampling params)
    ModelBackend-->>OpenAIServer: Generated tokens + logprobs
    OpenAIServer->>Postprocessor: Postprocess response (top_logprobs flag)
    Postprocessor-->>OpenAIServer: Response with per-token logprobs/top_logprobs
    OpenAIServer-->>Client: Return chat completion with logprobs/top_logprobs

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Suggested reviewers

litaotju
kaiyux
nv-guomingz
pcastonguay

Note

⚡️ Unit Test Generation is now available in beta!

Learn more here, or try it out under "Finishing Touches" below.

✨ Finishing Touches

📝 Generate Docstrings

🧪 Generate unit tests

Create PR with unit tests
Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>, please review it.
- Explain this complex logic.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
- @coderabbitai explain this code block.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
- @coderabbitai read src/utils.ts and explain its main purpose.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

CodeRabbit Commands (Invoked using PR comments)

@coderabbitai pause to pause the reviews on a PR.
@coderabbitai resume to resume the paused reviews.
@coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
@coderabbitai full review to do a full review from scratch and review all the files again.
@coderabbitai summary to regenerate the summary of the PR.
@coderabbitai generate docstrings to generate docstrings for this PR.
@coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
@coderabbitai generate unit tests to generate unit tests for this PR.
@coderabbitai resolve resolve all the CodeRabbit review comments.
@coderabbitai configuration to show the current CodeRabbit configuration for the repository.
@coderabbitai help to get help.

Other keywords and placeholders

Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
Add @coderabbitai or @coderabbitai title anywhere in the PR title to generate the title automatically.

Documentation and Community

Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

LinPoly · 2025-07-23T14:44:49Z

/bot run

tensorrt-cicd · 2025-07-23T14:50:15Z

PR_Github #12717 [ run ] triggered by Bot

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (1)

tests/unittest/llmapi/apps/_test_trtllm_serve_top_logprobs.py (1)
99-135: Consider testing top_logprobs=1 on both backends.

While the test logic is correct, the skip condition might result in incomplete coverage. The TRT backend is only tested with top_logprobs=5 in the other test, but not with top_logprobs=1. Consider testing both backends for the k=1 case to ensure complete coverage.
-    # Skip if backend is TRT because it is tested in test_chat_completion_top5_logprobs
-    if backend == "trt":
-        pytest.skip(
-            "TRT top logprobs is already tested in test_chat_completion_top5_logprobs"
-        )
+    # Skip if backend is PyTorch and we want to test k > 1 (PyTorch only supports k=1)
+    # For this test, both backends should work with top_logprobs=1
Alternatively, you could rename this test to be more specific about testing PyTorch-only functionality if k=1 on TRT is indeed redundant.

📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 8a51c4b and 0da4cf9.

📒 Files selected for processing (6)

tensorrt_llm/executor/postproc_worker.py (0 hunks)
tests/integration/defs/test_e2e.py (1 hunks)
tests/integration/test_lists/test-db/l0_a10.yml (2 hunks)
tests/unittest/llmapi/apps/_test_openai_chat.py (2 hunks)
tests/unittest/llmapi/apps/_test_trtllm_serve_top_logprobs.py (1 hunks)
tests/unittest/llmapi/apps/openai_server.py (1 hunks)

💤 Files with no reviewable changes (1)

tensorrt_llm/executor/postproc_worker.py

🚧 Files skipped from review as they are similar to previous changes (3)

tests/unittest/llmapi/apps/openai_server.py
tests/integration/defs/test_e2e.py
tests/integration/test_lists/test-db/l0_a10.yml

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)

GitHub Check: Pre-commit Check

🔇 Additional comments (6)

tests/unittest/llmapi/apps/_test_openai_chat.py (3)

142-154: LGTM! Good consolidation of logprobs testing.

The integration of logprobs testing into the main test function simplifies the test structure while maintaining proper validation of logprobs fields. The assertion that top_logprobs is None is correct since the top_logprobs parameter is not provided in the request.

177-181: Good improvement with function rename and updated skip message.

The rename to test_multiple_responses is more grammatically correct, and the updated skip message provides clearer context about backend support limitations.

215-292: Confirm streaming logprobs behavior for chat completions.

I didn’t find any TODOs, FIXMEs, or comments suggesting known issues with streaming logprobs, and the existing tests in tests/unittest/llmapi/apps/_test_openai_chat.py and tests/unittest/llmapi/test_llm.py still cover non-streaming vs. streaming consistency. Please run these streaming tests end-to-end against the updated logprobs implementation to ensure they still pass without errors.

tests/unittest/llmapi/apps/_test_trtllm_serve_top_logprobs.py (3)

14-43: Well-structured fixture setup for logprobs testing.

The fixtures properly configure the test environment with gather_generation_logits: True which is essential for logprobs functionality. The temporary file cleanup is handled correctly.

46-59: Clean server fixture implementation.

The server fixture correctly configures the RemoteOpenAIServer with the necessary parameters and uses proper resource management with context managers.

62-96: Comprehensive test for top-5 logprobs functionality.

The test properly validates the TensorRT backend's top-k logprobs feature with thorough assertions. The skip condition for PyTorch is correct, and the use of ignore_eos=True ensures predictable token counts for testing.

tensorrt-cicd · 2025-07-23T15:19:27Z

PR_Github #12717 [ run ] completed with state FAILURE
/LLM/main/L0_MergeRequest_PR pipeline #9464 completed with status: 'FAILURE'

LinPoly · 2025-07-24T09:16:07Z

/bot run

tests/unittest/llmapi/apps/openai_server.py

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (2)

tests/unittest/llmapi/apps/_test_trtllm_serve_top_logprobs.py (2)
62-97: Strong test implementation with one minor suggestion.

The test correctly validates top-5 logprobs for TRT backend with comprehensive assertions. The skip logic for PyTorch is appropriate.

Consider adding a comment explaining why ignore_eos: True is used in the extra_body:
         extra_body={
+            # Ignore EOS to ensure consistent token count for testing
             "ignore_eos": True,
         })
99-136: Well-implemented PyTorch top-1 logprobs test.

The test correctly validates top-1 logprobs for PyTorch backend and includes appropriate assertions. The skip logic for TRT is reasonable to avoid redundant testing.

Same suggestion as the previous test - consider adding a comment for the ignore_eos: True setting for clarity.

📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 0da4cf9 and 02a3ebe.

📒 Files selected for processing (10)

tensorrt_llm/executor/postproc_worker.py (0 hunks)
tensorrt_llm/executor/result.py (2 hunks)
tensorrt_llm/serve/openai_protocol.py (5 hunks)
tensorrt_llm/serve/openai_server.py (1 hunks)
tensorrt_llm/serve/postprocess_handlers.py (5 hunks)
tests/integration/defs/test_e2e.py (1 hunks)
tests/integration/test_lists/test-db/l0_a10.yml (2 hunks)
tests/unittest/llmapi/apps/_test_openai_chat.py (2 hunks)
tests/unittest/llmapi/apps/_test_trtllm_serve_top_logprobs.py (1 hunks)
tests/unittest/llmapi/apps/openai_server.py (1 hunks)

💤 Files with no reviewable changes (1)

tensorrt_llm/executor/postproc_worker.py

🚧 Files skipped from review as they are similar to previous changes (7)

tests/unittest/llmapi/apps/openai_server.py
tests/integration/test_lists/test-db/l0_a10.yml
tests/integration/defs/test_e2e.py
tensorrt_llm/serve/openai_server.py
tensorrt_llm/executor/result.py
tensorrt_llm/serve/postprocess_handlers.py
tensorrt_llm/serve/openai_protocol.py

🧰 Additional context used

📓 Path-based instructions (2)

**/*.py

📄 CodeRabbit Inference Engine (CODING_GUIDELINES.md)

**/*.py: Python code should conform to Python 3.8+.
Indent Python code with 4 spaces. Do not use tabs.
Always maintain the namespace when importing in Python, even if only one class or function from a module is used.
Python filenames should use snake_case (e.g., some_file.py).
Python classes should use PascalCase (e.g., class SomeClass).
Python functions and methods should use snake_case (e.g., def my_awesome_function():).
Python local variables should use snake_case. Prefix k for variable names that start with a number (e.g., k_99th_percentile = ...).
Python global variables should use upper snake_case and prefix G (e.g., G_MY_GLOBAL = ...).
Python constants should use upper snake_case (e.g., MY_CONSTANT = ...).
Avoid shadowing variables declared in an outer scope in Python.
Initialize all externally visible members of a Python class in the constructor.
For interfaces that may be used outside a Python file, prefer docstrings over comments.
Comments in Python should be reserved for code within a function, or interfaces that are local to a file.
Use Google style docstrings for Python classes and functions, which can be parsed by Sphinx.
Attributes and variables in Python can be documented inline; attribute docstrings will be rendered under the docstring for the class.
Avoid using reflection in Python when functionality can be easily achieved without it.
When using try-except blocks in Python, limit the except to the smallest set of errors possible.
When using try-except blocks to handle multiple possible variable types in Python, keep the body of the try as small as possible, using the else block to implement the logic.

Files:

tests/unittest/llmapi/apps/_test_trtllm_serve_top_logprobs.py
tests/unittest/llmapi/apps/_test_openai_chat.py

**/*.{cpp,h,hpp,cc,cxx,cu,py}

📄 CodeRabbit Inference Engine (CODING_GUIDELINES.md)

All TensorRT-LLM Open Source Software code should contain an NVIDIA copyright header that includes the current year. This includes .cpp, .h, .cu, .py, and any other source files which are compiled or interpreted.

Files:

tests/unittest/llmapi/apps/_test_trtllm_serve_top_logprobs.py
tests/unittest/llmapi/apps/_test_openai_chat.py

🔇 Additional comments (9)

tests/unittest/llmapi/apps/_test_openai_chat.py (3)

142-155: LGTM: Good integration of logprobs testing into existing test.

The logprobs test section is well-structured and validates the essential properties of the logprob response. The assertions correctly check for non-null token, logprob, bytes, and verify that top_logprobs is None (as expected for basic logprobs without top_k).

177-181: Good function rename and updated skip message.

The function rename from test_multiple_response to test_multiple_responses improves grammar, and the updated skip message better reflects the actual limitation (multiple responses vs beam search).

215-292: Ensure streaming logprobs consistency across all backends

The OpenAI‐specific streaming test in tests/unittest/llmapi/apps/_test_openai_chat.py covers one backend, but PR notes mention “random length mismatches.” Please extend or reuse your existing logprobs harness to verify that for every supported backend:

Streaming and non-streaming logprobs produce the same number of token scores.

The per-token logprob values match (or fall within an acceptable tolerance if backend noise is expected).

Key locations to update or audit:
• tests/unittest/llmapi/apps/_test_openai_chat.py (lines 215–292)
• tests/unittest/llmapi/test_llm.py → test_llm_return_logprobs_streaming (llm_return_logprobs_test_harness)

Consider parametrizing the harness over all backends/models and asserting:
assert len(streaming_logprobs) == len(non_streaming_logprobs)
assert np.allclose(streaming_logprobs, non_streaming_logprobs)
tests/unittest/llmapi/apps/_test_trtllm_serve_top_logprobs.py (6)

1-12: LGTM: Proper imports and test setup.

The imports are appropriate and follow the namespace convention. The threadleak marking is consistent with other test files.

14-22: LGTM: Well-structured fixtures.

The model and backend fixtures are properly configured with appropriate scope and parameterization.

24-44: LGTM: Proper temporary file management with essential configuration.

The fixture correctly manages temporary file lifecycle and includes the crucial gather_generation_logits: True setting needed for logprobs functionality.

46-55: LGTM: Clean server fixture setup.

The server fixture properly integrates the temporary config file and follows the established pattern.

57-60: LGTM: Appropriate async client fixture.

The async client fixture is correctly set up for the asynchronous test functions.

62-136: Ensure logprobs.content always matches max_completion_tokens under ignore_eos=True

I wasn’t able to find definitive evidence in the codebase that setting ignore_eos=True will force the server to generate exactly max_completion_tokens tokens (and thus that len(logprobs.content) == max_completion_tokens in every case, even if an early EOS is predicted). Please manually verify that:

With ignore_eos=True, the service does not stop on EOS and always emits exactly max_completion_tokens entries in logprobs.content.

Both TRT and other backends exhibit the same behavior.

Optionally, add a test where the model returns EOS early (e.g. at token 3) to confirm that the returned logprobs.content list is still length 10.

tensorrt-cicd · 2025-07-24T09:21:09Z

PR_Github #12835 [ run ] triggered by Bot

LinPoly · 2025-07-24T09:23:03Z

/bot kill

tensorrt-cicd · 2025-07-24T09:28:10Z

PR_Github #12836 [ kill ] triggered by Bot

tensorrt-cicd · 2025-07-24T09:28:11Z

PR_Github #12835 [ run ] completed with state ABORTED

tensorrt-cicd · 2025-07-24T09:28:41Z

PR_Github #12836 [ kill ] completed with state SUCCESS
Successfully killed previous jobs for commit 2b793cd

LinPoly · 2025-07-24T09:58:58Z

/bot run

tensorrt-cicd · 2025-07-24T10:04:09Z

PR_Github #12841 [ run ] triggered by Bot

tensorrt-cicd · 2025-07-24T13:43:41Z

PR_Github #12841 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #9572 completed with status: 'FAILURE'

LinPoly · 2025-09-09T13:10:11Z

/bot run

tensorrt-cicd · 2025-09-09T13:15:33Z

PR_Github #18229 [ run ] triggered by Bot

tensorrt-cicd · 2025-09-09T23:16:40Z

PR_Github #18229 [ run ] completed with state FAILURE
/LLM/main/L0_MergeRequest_PR pipeline #13662 completed with status: 'FAILURE'

2. Top1 logprobs in PyT backend Signed-off-by: Pengyun Lin <[email protected]>

Signed-off-by: Pengyun Lin <[email protected]>

LinPoly · 2025-09-10T06:04:09Z

/bot run

tensorrt-cicd · 2025-09-10T06:16:30Z

PR_Github #18307 [ run ] triggered by Bot

tensorrt-cicd · 2025-09-10T06:47:49Z

PR_Github #18307 [ run ] completed with state FAILURE
/LLM/main/L0_MergeRequest_PR pipeline #13731 completed with status: 'FAILURE'

LinPoly · 2025-09-10T07:38:29Z

/bot run

tensorrt-cicd · 2025-09-10T07:44:13Z

PR_Github #18336 [ run ] triggered by Bot

tensorrt-cicd · 2025-09-10T08:12:36Z

PR_Github #18336 [ run ] completed with state FAILURE
/LLM/main/L0_MergeRequest_PR pipeline #13752 completed with status: 'FAILURE'

LinPoly · 2025-09-10T12:06:30Z

/bot run

tensorrt-cicd · 2025-09-10T12:13:40Z

PR_Github #18365 [ run ] triggered by Bot

tensorrt-cicd · 2025-09-11T00:29:23Z

PR_Github #18365 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #13774 completed with status: 'FAILURE'

LinPoly · 2025-09-12T04:39:47Z

/bot run

tensorrt-cicd · 2025-09-12T04:45:02Z

PR_Github #18461 [ run ] triggered by Bot

tensorrt-cicd · 2025-09-12T07:30:03Z

PR_Github #18461 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #13850 completed with status: 'SUCCESS'

…r PyT backend (NVIDIA#6097) Signed-off-by: Pengyun Lin <[email protected]>

LinPoly requested a review from a team as a code owner July 16, 2025 13:48

LinPoly requested a review from nv-guomingz July 16, 2025 13:48

LinPoly force-pushed the user/pengyunl/serve_pyt_logprobs branch from c798360 to 8a51c4b Compare July 23, 2025 14:38

LinPoly requested review from hchings, syuoni and tongyuantongyu July 23, 2025 14:40

LinPoly force-pushed the user/pengyunl/serve_pyt_logprobs branch from 8a51c4b to 0da4cf9 Compare July 23, 2025 14:43

LinPoly changed the title ~~[draft][TRTLLM-1302][feat]: topk logprobs for TRT backend & top1 logprob for PyT backend~~ [TRTLLM-1302][feat]: topk logprobs for TRT backend & top1 logprob for PyT backend Jul 23, 2025

coderabbitai bot reviewed Jul 23, 2025

View reviewed changes

LinPoly force-pushed the user/pengyunl/serve_pyt_logprobs branch from 0da4cf9 to 02a3ebe Compare July 24, 2025 09:15

coderabbitai bot requested review from HuiGao-NV and shaharmor98 July 24, 2025 09:16

coderabbitai bot added the Community want to contribute PRs initiated from Community label Jul 24, 2025

tongyuantongyu reviewed Jul 24, 2025

View reviewed changes

tests/unittest/llmapi/apps/openai_server.py Outdated Show resolved Hide resolved

coderabbitai bot reviewed Jul 24, 2025

View reviewed changes

LinPoly requested review from a team as code owners July 25, 2025 09:28

LinPoly added 10 commits September 10, 2025 06:02

1. Topk logprobs in TRT backend

a46c4f6

2. Top1 logprobs in PyT backend Signed-off-by: Pengyun Lin <[email protected]>

Remove debug log

c737d72

Signed-off-by: Pengyun Lin <[email protected]>

Add test & refine code

3c5b779

Signed-off-by: Pengyun Lin <[email protected]>

Fix test

4245be3

Signed-off-by: Pengyun Lin <[email protected]>

Revert change

b83b5ea

Signed-off-by: Pengyun Lin <[email protected]>

Fix types

afcbb85

Signed-off-by: Pengyun Lin <[email protected]>

Fix api stability & types

380b2aa

Signed-off-by: Pengyun Lin <[email protected]>

Minor enhancement

f88683c

Signed-off-by: Pengyun Lin <[email protected]>

Fix

6521084

Signed-off-by: Pengyun Lin <[email protected]>

Fix

6c1dc41

Signed-off-by: Pengyun Lin <[email protected]>

LinPoly force-pushed the user/pengyunl/serve_pyt_logprobs branch from 26dd798 to 6c1dc41 Compare September 10, 2025 06:03

QiJune approved these changes Sep 10, 2025

View reviewed changes

LinPoly merged commit c2bc39a into NVIDIA:main Sep 12, 2025
5 checks passed

Wong4j pushed a commit to Wong4j/TensorRT-LLM that referenced this pull request Sep 20, 2025

[TRTLLM-1302][feat] Topk logprobs for TRT backend and top1 logprob fo…

116f1ae

…r PyT backend (NVIDIA#6097) Signed-off-by: Pengyun Lin <[email protected]>

MrGeva pushed a commit to nv-auto-deploy/TensorRT-LLM that referenced this pull request Sep 21, 2025

[TRTLLM-1302][feat] Topk logprobs for TRT backend and top1 logprob fo…

bbab97b

…r PyT backend (NVIDIA#6097) Signed-off-by: Pengyun Lin <[email protected]>

[TRTLLM-1302][feat] Topk logprobs for TRT backend and top1 logprob for PyT backend #6097

[TRTLLM-1302][feat] Topk logprobs for TRT backend and top1 logprob for PyT backend #6097

Uh oh!

Conversation

LinPoly commented Jul 16, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Test Coverage

GitHub Bot Help

kill

skip

reuse-pipeline

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Jul 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Suggested reviewers

Chat

Support

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

Documentation and Community

Uh oh!

LinPoly commented Jul 23, 2025

Uh oh!

tensorrt-cicd commented Jul 23, 2025

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

tensorrt-cicd commented Jul 23, 2025

Uh oh!

LinPoly commented Jul 24, 2025

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

tensorrt-cicd commented Jul 24, 2025

Uh oh!

LinPoly commented Jul 24, 2025

Uh oh!

tensorrt-cicd commented Jul 24, 2025

Uh oh!

tensorrt-cicd commented Jul 24, 2025

Uh oh!

tensorrt-cicd commented Jul 24, 2025

Uh oh!

LinPoly commented Jul 24, 2025

Uh oh!

tensorrt-cicd commented Jul 24, 2025

Uh oh!

tensorrt-cicd commented Jul 24, 2025

Uh oh!

LinPoly commented Sep 9, 2025

Uh oh!

tensorrt-cicd commented Sep 9, 2025

Uh oh!

tensorrt-cicd commented Sep 9, 2025

Uh oh!

LinPoly commented Sep 10, 2025

Uh oh!

tensorrt-cicd commented Sep 10, 2025

Uh oh!

tensorrt-cicd commented Sep 10, 2025

Uh oh!

LinPoly commented Sep 10, 2025

Uh oh!

tensorrt-cicd commented Sep 10, 2025

Uh oh!

tensorrt-cicd commented Sep 10, 2025

Uh oh!

LinPoly commented Sep 10, 2025

Uh oh!

tensorrt-cicd commented Sep 10, 2025

Uh oh!

LinPoly commented Jul 16, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Jul 23, 2025 •

edited

Loading