Skip to content

Conversation

LinPoly
Copy link
Collaborator

@LinPoly LinPoly commented Jul 16, 2025

Description

  1. TopK logprobs for TRT backend with gathering context & generation logits
  2. Top1 logprobs for PyT backend
    For now some length mismatch happens randomly, still need to investigate.

Test Coverage

Will be added.

GitHub Bot Help

/bot [-h] ['run', 'kill', 'skip', 'reuse-pipeline'] ...

Provide a user friendly way for developers to interact with a Jenkins server.

Run /bot [-h|--help] to print this help message.

See details below for each supported subcommand.

run [--disable-fail-fast --skip-test --stage-list "A10-1, xxx" --gpu-type "A30, H100_PCIe" --add-multi-gpu-test --only-multi-gpu-test --disable-multi-gpu-test --post-merge --extra-stage "H100_PCIe-[Post-Merge]-1, xxx"]

Launch build/test pipelines. All previously running jobs will be killed.

--disable-fail-fast (OPTIONAL) : Disable fail fast on build/tests/infra failures.

--skip-test (OPTIONAL) : Skip all test stages, but still run build stages, package stages and sanity check stages. Note: Does NOT update GitHub check status.

--stage-list "A10-1, xxx" (OPTIONAL) : Only run the specified test stages. Examples: "A10-1, xxx". Note: Does NOT update GitHub check status.

--gpu-type "A30, H100_PCIe" (OPTIONAL) : Only run the test stages on the specified GPU types. Examples: "A30, H100_PCIe". Note: Does NOT update GitHub check status.

--only-multi-gpu-test (OPTIONAL) : Only run the multi-GPU tests. Note: Does NOT update GitHub check status.

--disable-multi-gpu-test (OPTIONAL) : Disable the multi-GPU tests. Note: Does NOT update GitHub check status.

--add-multi-gpu-test (OPTIONAL) : Force run the multi-GPU tests. Will also run L0 pre-merge pipeline.

--post-merge (OPTIONAL) : Run the L0 post-merge pipeline instead of the ordinary L0 pre-merge pipeline.

--extra-stage "H100_PCIe-[Post-Merge]-1, xxx" (OPTIONAL) : Run the ordinary L0 pre-merge pipeline and specified test stages. Examples: --extra-stage "H100_PCIe-[Post-Merge]-1, xxx".

For guidance on mapping tests to stage names, see docs/source/reference/ci-overview.md.

kill

kill

Kill all running builds associated with pull request.

skip

skip --comment COMMENT

Skip testing for latest commit on pull request. --comment "Reason for skipping build/test" is required. IMPORTANT NOTE: This is dangerous since lack of user care and validation can cause top of tree to break.

reuse-pipeline

reuse-pipeline

Reuse a previous pipeline to validate current commit. This action will also kill all currently running builds associated with the pull request. IMPORTANT NOTE: This is dangerous since lack of user care and validation can cause top of tree to break.

Summary by CodeRabbit

  • New Features

    • Added support for returning top-k log probabilities per token in chat completions, enabling detailed token probability insights.
    • Enhanced API to optionally include top log probabilities in chat completion responses.
    • Extended request parameters to better control logprobs gathering and backend-specific behavior.
  • Bug Fixes

    • Improved validation and handling of logprobs and top_logprobs parameters in chat completion requests.
    • Refined internal logprob state management for more accurate probability tracking.
  • Tests

    • Added new integration and unit tests validating top-k logprobs functionality on PyTorch and TensorRT backends.
    • Consolidated and updated existing tests for logprobs and chat completions, removing redundant streaming tests.
  • Documentation

    • Updated parameter validation logic and API behavior for logprobs and top_logprobs fields in chat completion requests.

@LinPoly LinPoly requested a review from a team as a code owner July 16, 2025 13:48
@LinPoly LinPoly requested a review from nv-guomingz July 16, 2025 13:48
@LinPoly LinPoly force-pushed the user/pengyunl/serve_pyt_logprobs branch from c798360 to 8a51c4b Compare July 23, 2025 14:38
Copy link
Contributor

coderabbitai bot commented Jul 23, 2025

📝 Walkthrough

Walkthrough

This change introduces support for top-k log probabilities in chat completions, updating request and response schemas, validation logic, and postprocessing to handle and return detailed per-token logprobs. It also adds new integration and unit tests for this feature, adjusts type annotations, modifies related sampling parameter logic and server invocation, and removes some legacy and streaming tests.

Changes

File(s) Change Summary
Executor module cleanup
tensorrt_llm/executor/postproc_worker.py
Removed unused import of zmq.asyncio.
Result handling and type updates
tensorrt_llm/executor/result.py
Expanded type annotations for logprobs and logprobs_diff fields to allow TokenLogprobs or List[float]; modified _handle_sequence to append or assign logprobs conditionally and update internal state tracking.
OpenAI protocol schema and validation
tensorrt_llm/serve/openai_protocol.py
Changed logprobs and top_logprobs field types and defaults in request schemas; updated validation logic to allow zero or positive top_logprobs and enforce consistency; modified to_sampling_params to handle new logprobs logic with backend-specific behavior; changed max_completion_tokens to optional.
Server invocation update
tensorrt_llm/serve/openai_server.py
Updated call to request.to_sampling_params() to pass gather_generation_logits and backend flags from LLM configuration.
Postprocessing enhancements
tensorrt_llm/serve/postprocess_handlers.py
Added support for top-k logprobs in postprocessing: new top_logprobs flag in ChatPostprocArgs, updated create_logprobs to handle logprobs as list or dict with top logprobs details, and adjusted postprocessor functions to pass the flag.
Integration test addition
tests/integration/defs/test_e2e.py
Added new parameterized test test_trtllm_serve_top_logprobs for top-k logprobs serving, running pytest on a dedicated test script with backend filtering.
Integration test list update
tests/integration/test_lists/test-db/l0_a10.yml
Registered new top-k logprobs test cases for both PyTorch and TRT backends in the integration test list.
Unit test refactoring and removals
tests/unittest/llmapi/apps/_test_openai_chat.py
Consolidated logprobs testing into the main chat session test; renamed a test for multiple responses; removed async streaming chat completion tests with and without logprobs.
New unit tests for top-k logprobs
tests/unittest/llmapi/apps/_test_trtllm_serve_top_logprobs.py
Added new async test module validating top-k logprobs in chat completions, with fixtures for model, backend, config, server, and client; includes tests for both top-5 and top-1 logprobs per backend.
API stability reference update
tests/unittest/api_stability/references/completion_output.yaml
Expanded logprobs_diff property type annotation to union of list of dicts or list of floats.
API stability committed reference update
tests/unittest/api_stability/references_committed/completion_output.yaml
Expanded logprobs parameter type annotation in __init__ to allow list of dicts or list of floats.

Sequence Diagram(s)

sequenceDiagram
    participant Client
    participant OpenAIServer
    participant ChatCompletionRequest
    participant Postprocessor
    participant ModelBackend

    Client->>OpenAIServer: POST /chat/completions (logprobs, top_logprobs)
    OpenAIServer->>ChatCompletionRequest: Parse and validate request
    ChatCompletionRequest->>OpenAIServer: to_sampling_params(gather_generation_logits, backend)
    OpenAIServer->>ModelBackend: Generate completion (with sampling params)
    ModelBackend-->>OpenAIServer: Generated tokens + logprobs
    OpenAIServer->>Postprocessor: Postprocess response (top_logprobs flag)
    Postprocessor-->>OpenAIServer: Response with per-token logprobs/top_logprobs
    OpenAIServer-->>Client: Return chat completion with logprobs/top_logprobs
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Suggested reviewers

  • litaotju
  • kaiyux
  • nv-guomingz
  • pcastonguay

Note

⚡️ Unit Test Generation is now available in beta!

Learn more here, or try it out under "Finishing Touches" below.

✨ Finishing Touches
  • 📝 Generate Docstrings
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Explain this complex logic.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai explain this code block.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and explain its main purpose.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR.
  • @coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
  • @coderabbitai generate unit tests to generate unit tests for this PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai or @coderabbitai title anywhere in the PR title to generate the title automatically.

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

@LinPoly LinPoly force-pushed the user/pengyunl/serve_pyt_logprobs branch from 8a51c4b to 0da4cf9 Compare July 23, 2025 14:43
@LinPoly LinPoly changed the title [draft][TRTLLM-1302][feat]: topk logprobs for TRT backend & top1 logprob for PyT backend [TRTLLM-1302][feat]: topk logprobs for TRT backend & top1 logprob for PyT backend Jul 23, 2025
@LinPoly
Copy link
Collaborator Author

LinPoly commented Jul 23, 2025

/bot run

@tensorrt-cicd
Copy link
Collaborator

PR_Github #12717 [ run ] triggered by Bot

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
tests/unittest/llmapi/apps/_test_trtllm_serve_top_logprobs.py (1)

99-135: Consider testing top_logprobs=1 on both backends.

While the test logic is correct, the skip condition might result in incomplete coverage. The TRT backend is only tested with top_logprobs=5 in the other test, but not with top_logprobs=1. Consider testing both backends for the k=1 case to ensure complete coverage.

-    # Skip if backend is TRT because it is tested in test_chat_completion_top5_logprobs
-    if backend == "trt":
-        pytest.skip(
-            "TRT top logprobs is already tested in test_chat_completion_top5_logprobs"
-        )
+    # Skip if backend is PyTorch and we want to test k > 1 (PyTorch only supports k=1)
+    # For this test, both backends should work with top_logprobs=1

Alternatively, you could rename this test to be more specific about testing PyTorch-only functionality if k=1 on TRT is indeed redundant.

📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 8a51c4b and 0da4cf9.

📒 Files selected for processing (6)
  • tensorrt_llm/executor/postproc_worker.py (0 hunks)
  • tests/integration/defs/test_e2e.py (1 hunks)
  • tests/integration/test_lists/test-db/l0_a10.yml (2 hunks)
  • tests/unittest/llmapi/apps/_test_openai_chat.py (2 hunks)
  • tests/unittest/llmapi/apps/_test_trtllm_serve_top_logprobs.py (1 hunks)
  • tests/unittest/llmapi/apps/openai_server.py (1 hunks)
💤 Files with no reviewable changes (1)
  • tensorrt_llm/executor/postproc_worker.py
🚧 Files skipped from review as they are similar to previous changes (3)
  • tests/unittest/llmapi/apps/openai_server.py
  • tests/integration/defs/test_e2e.py
  • tests/integration/test_lists/test-db/l0_a10.yml
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Pre-commit Check
🔇 Additional comments (6)
tests/unittest/llmapi/apps/_test_openai_chat.py (3)

142-154: LGTM! Good consolidation of logprobs testing.

The integration of logprobs testing into the main test function simplifies the test structure while maintaining proper validation of logprobs fields. The assertion that top_logprobs is None is correct since the top_logprobs parameter is not provided in the request.


177-181: Good improvement with function rename and updated skip message.

The rename to test_multiple_responses is more grammatically correct, and the updated skip message provides clearer context about backend support limitations.


215-292: Confirm streaming logprobs behavior for chat completions.

I didn’t find any TODOs, FIXMEs, or comments suggesting known issues with streaming logprobs, and the existing tests in tests/unittest/llmapi/apps/_test_openai_chat.py and tests/unittest/llmapi/test_llm.py still cover non-streaming vs. streaming consistency. Please run these streaming tests end-to-end against the updated logprobs implementation to ensure they still pass without errors.

tests/unittest/llmapi/apps/_test_trtllm_serve_top_logprobs.py (3)

14-43: Well-structured fixture setup for logprobs testing.

The fixtures properly configure the test environment with gather_generation_logits: True which is essential for logprobs functionality. The temporary file cleanup is handled correctly.


46-59: Clean server fixture implementation.

The server fixture correctly configures the RemoteOpenAIServer with the necessary parameters and uses proper resource management with context managers.


62-96: Comprehensive test for top-5 logprobs functionality.

The test properly validates the TensorRT backend's top-k logprobs feature with thorough assertions. The skip condition for PyTorch is correct, and the use of ignore_eos=True ensures predictable token counts for testing.

@tensorrt-cicd
Copy link
Collaborator

PR_Github #12717 [ run ] completed with state FAILURE
/LLM/main/L0_MergeRequest_PR pipeline #9464 completed with status: 'FAILURE'

@LinPoly LinPoly force-pushed the user/pengyunl/serve_pyt_logprobs branch from 0da4cf9 to 02a3ebe Compare July 24, 2025 09:15
@LinPoly
Copy link
Collaborator Author

LinPoly commented Jul 24, 2025

/bot run

@coderabbitai coderabbitai bot requested review from HuiGao-NV and shaharmor98 July 24, 2025 09:16
@coderabbitai coderabbitai bot added the Community want to contribute PRs initiated from Community label Jul 24, 2025
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (2)
tests/unittest/llmapi/apps/_test_trtllm_serve_top_logprobs.py (2)

62-97: Strong test implementation with one minor suggestion.

The test correctly validates top-5 logprobs for TRT backend with comprehensive assertions. The skip logic for PyTorch is appropriate.

Consider adding a comment explaining why ignore_eos: True is used in the extra_body:

         extra_body={
+            # Ignore EOS to ensure consistent token count for testing
             "ignore_eos": True,
         })

99-136: Well-implemented PyTorch top-1 logprobs test.

The test correctly validates top-1 logprobs for PyTorch backend and includes appropriate assertions. The skip logic for TRT is reasonable to avoid redundant testing.

Same suggestion as the previous test - consider adding a comment for the ignore_eos: True setting for clarity.

📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 0da4cf9 and 02a3ebe.

📒 Files selected for processing (10)
  • tensorrt_llm/executor/postproc_worker.py (0 hunks)
  • tensorrt_llm/executor/result.py (2 hunks)
  • tensorrt_llm/serve/openai_protocol.py (5 hunks)
  • tensorrt_llm/serve/openai_server.py (1 hunks)
  • tensorrt_llm/serve/postprocess_handlers.py (5 hunks)
  • tests/integration/defs/test_e2e.py (1 hunks)
  • tests/integration/test_lists/test-db/l0_a10.yml (2 hunks)
  • tests/unittest/llmapi/apps/_test_openai_chat.py (2 hunks)
  • tests/unittest/llmapi/apps/_test_trtllm_serve_top_logprobs.py (1 hunks)
  • tests/unittest/llmapi/apps/openai_server.py (1 hunks)
💤 Files with no reviewable changes (1)
  • tensorrt_llm/executor/postproc_worker.py
🚧 Files skipped from review as they are similar to previous changes (7)
  • tests/unittest/llmapi/apps/openai_server.py
  • tests/integration/test_lists/test-db/l0_a10.yml
  • tests/integration/defs/test_e2e.py
  • tensorrt_llm/serve/openai_server.py
  • tensorrt_llm/executor/result.py
  • tensorrt_llm/serve/postprocess_handlers.py
  • tensorrt_llm/serve/openai_protocol.py
🧰 Additional context used
📓 Path-based instructions (2)
**/*.py

📄 CodeRabbit Inference Engine (CODING_GUIDELINES.md)

**/*.py: Python code should conform to Python 3.8+.
Indent Python code with 4 spaces. Do not use tabs.
Always maintain the namespace when importing in Python, even if only one class or function from a module is used.
Python filenames should use snake_case (e.g., some_file.py).
Python classes should use PascalCase (e.g., class SomeClass).
Python functions and methods should use snake_case (e.g., def my_awesome_function():).
Python local variables should use snake_case. Prefix k for variable names that start with a number (e.g., k_99th_percentile = ...).
Python global variables should use upper snake_case and prefix G (e.g., G_MY_GLOBAL = ...).
Python constants should use upper snake_case (e.g., MY_CONSTANT = ...).
Avoid shadowing variables declared in an outer scope in Python.
Initialize all externally visible members of a Python class in the constructor.
For interfaces that may be used outside a Python file, prefer docstrings over comments.
Comments in Python should be reserved for code within a function, or interfaces that are local to a file.
Use Google style docstrings for Python classes and functions, which can be parsed by Sphinx.
Attributes and variables in Python can be documented inline; attribute docstrings will be rendered under the docstring for the class.
Avoid using reflection in Python when functionality can be easily achieved without it.
When using try-except blocks in Python, limit the except to the smallest set of errors possible.
When using try-except blocks to handle multiple possible variable types in Python, keep the body of the try as small as possible, using the else block to implement the logic.

Files:

  • tests/unittest/llmapi/apps/_test_trtllm_serve_top_logprobs.py
  • tests/unittest/llmapi/apps/_test_openai_chat.py
**/*.{cpp,h,hpp,cc,cxx,cu,py}

📄 CodeRabbit Inference Engine (CODING_GUIDELINES.md)

All TensorRT-LLM Open Source Software code should contain an NVIDIA copyright header that includes the current year. This includes .cpp, .h, .cu, .py, and any other source files which are compiled or interpreted.

Files:

  • tests/unittest/llmapi/apps/_test_trtllm_serve_top_logprobs.py
  • tests/unittest/llmapi/apps/_test_openai_chat.py
🔇 Additional comments (9)
tests/unittest/llmapi/apps/_test_openai_chat.py (3)

142-155: LGTM: Good integration of logprobs testing into existing test.

The logprobs test section is well-structured and validates the essential properties of the logprob response. The assertions correctly check for non-null token, logprob, bytes, and verify that top_logprobs is None (as expected for basic logprobs without top_k).


177-181: Good function rename and updated skip message.

The function rename from test_multiple_response to test_multiple_responses improves grammar, and the updated skip message better reflects the actual limitation (multiple responses vs beam search).


215-292: Ensure streaming logprobs consistency across all backends

The OpenAI‐specific streaming test in tests/unittest/llmapi/apps/_test_openai_chat.py covers one backend, but PR notes mention “random length mismatches.” Please extend or reuse your existing logprobs harness to verify that for every supported backend:

  • Streaming and non-streaming logprobs produce the same number of token scores.
  • The per-token logprob values match (or fall within an acceptable tolerance if backend noise is expected).

Key locations to update or audit:
• tests/unittest/llmapi/apps/_test_openai_chat.py (lines 215–292)
• tests/unittest/llmapi/test_llm.py → test_llm_return_logprobs_streaming (llm_return_logprobs_test_harness)

Consider parametrizing the harness over all backends/models and asserting:

assert len(streaming_logprobs) == len(non_streaming_logprobs)
assert np.allclose(streaming_logprobs, non_streaming_logprobs)
tests/unittest/llmapi/apps/_test_trtllm_serve_top_logprobs.py (6)

1-12: LGTM: Proper imports and test setup.

The imports are appropriate and follow the namespace convention. The threadleak marking is consistent with other test files.


14-22: LGTM: Well-structured fixtures.

The model and backend fixtures are properly configured with appropriate scope and parameterization.


24-44: LGTM: Proper temporary file management with essential configuration.

The fixture correctly manages temporary file lifecycle and includes the crucial gather_generation_logits: True setting needed for logprobs functionality.


46-55: LGTM: Clean server fixture setup.

The server fixture properly integrates the temporary config file and follows the established pattern.


57-60: LGTM: Appropriate async client fixture.

The async client fixture is correctly set up for the asynchronous test functions.


62-136: Ensure logprobs.content always matches max_completion_tokens under ignore_eos=True

I wasn’t able to find definitive evidence in the codebase that setting ignore_eos=True will force the server to generate exactly max_completion_tokens tokens (and thus that len(logprobs.content) == max_completion_tokens in every case, even if an early EOS is predicted). Please manually verify that:

  • With ignore_eos=True, the service does not stop on EOS and always emits exactly max_completion_tokens entries in logprobs.content.
  • Both TRT and other backends exhibit the same behavior.
  • Optionally, add a test where the model returns EOS early (e.g. at token 3) to confirm that the returned logprobs.content list is still length 10.

@tensorrt-cicd
Copy link
Collaborator

PR_Github #12835 [ run ] triggered by Bot

@LinPoly
Copy link
Collaborator Author

LinPoly commented Jul 24, 2025

/bot kill

@tensorrt-cicd
Copy link
Collaborator

PR_Github #12836 [ kill ] triggered by Bot

@tensorrt-cicd
Copy link
Collaborator

PR_Github #12835 [ run ] completed with state ABORTED

@tensorrt-cicd
Copy link
Collaborator

PR_Github #12836 [ kill ] completed with state SUCCESS
Successfully killed previous jobs for commit 2b793cd

@LinPoly
Copy link
Collaborator Author

LinPoly commented Jul 24, 2025

/bot run

@tensorrt-cicd
Copy link
Collaborator

PR_Github #12841 [ run ] triggered by Bot

@tensorrt-cicd
Copy link
Collaborator

PR_Github #12841 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #9572 completed with status: 'FAILURE'

@LinPoly LinPoly requested review from a team as code owners July 25, 2025 09:28
@LinPoly
Copy link
Collaborator Author

LinPoly commented Sep 9, 2025

/bot run

@tensorrt-cicd
Copy link
Collaborator

PR_Github #18229 [ run ] triggered by Bot

@tensorrt-cicd
Copy link
Collaborator

PR_Github #18229 [ run ] completed with state FAILURE
/LLM/main/L0_MergeRequest_PR pipeline #13662 completed with status: 'FAILURE'

2. Top1 logprobs in PyT backend

Signed-off-by: Pengyun Lin <[email protected]>
Signed-off-by: Pengyun Lin <[email protected]>
Signed-off-by: Pengyun Lin <[email protected]>
Signed-off-by: Pengyun Lin <[email protected]>
Signed-off-by: Pengyun Lin <[email protected]>
Signed-off-by: Pengyun Lin <[email protected]>
Signed-off-by: Pengyun Lin <[email protected]>
Signed-off-by: Pengyun Lin <[email protected]>
Signed-off-by: Pengyun Lin <[email protected]>
Signed-off-by: Pengyun Lin <[email protected]>
@LinPoly LinPoly force-pushed the user/pengyunl/serve_pyt_logprobs branch from 26dd798 to 6c1dc41 Compare September 10, 2025 06:03
@LinPoly
Copy link
Collaborator Author

LinPoly commented Sep 10, 2025

/bot run

@tensorrt-cicd
Copy link
Collaborator

PR_Github #18307 [ run ] triggered by Bot

@tensorrt-cicd
Copy link
Collaborator

PR_Github #18307 [ run ] completed with state FAILURE
/LLM/main/L0_MergeRequest_PR pipeline #13731 completed with status: 'FAILURE'

@LinPoly
Copy link
Collaborator Author

LinPoly commented Sep 10, 2025

/bot run

@tensorrt-cicd
Copy link
Collaborator

PR_Github #18336 [ run ] triggered by Bot

@tensorrt-cicd
Copy link
Collaborator

PR_Github #18336 [ run ] completed with state FAILURE
/LLM/main/L0_MergeRequest_PR pipeline #13752 completed with status: 'FAILURE'

@LinPoly
Copy link
Collaborator Author

LinPoly commented Sep 10, 2025

/bot run

@tensorrt-cicd
Copy link
Collaborator

PR_Github #18365 [ run ] triggered by Bot

@tensorrt-cicd
Copy link
Collaborator

PR_Github #18365 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #13774 completed with status: 'FAILURE'

@LinPoly
Copy link
Collaborator Author

LinPoly commented Sep 12, 2025

/bot run

@tensorrt-cicd
Copy link
Collaborator

PR_Github #18461 [ run ] triggered by Bot

@tensorrt-cicd
Copy link
Collaborator

PR_Github #18461 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #13850 completed with status: 'SUCCESS'

@LinPoly LinPoly merged commit c2bc39a into NVIDIA:main Sep 12, 2025
5 checks passed
Wong4j pushed a commit to Wong4j/TensorRT-LLM that referenced this pull request Sep 20, 2025
MrGeva pushed a commit to nv-auto-deploy/TensorRT-LLM that referenced this pull request Sep 21, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants