-
Notifications
You must be signed in to change notification settings - Fork 1.8k
[None][doc] add a guide for modifying APIs #7866
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Superjomn
merged 1 commit into
NVIDIA:release/1.0
from
Superjomn:doc.add-developer-guide-api
Sep 22, 2025
+229
−0
Merged
Changes from all commits
Commits
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,228 @@ | ||
# LLM API Change Guide | ||
|
||
This guide explains how to modify and manage APIs in TensorRT LLM, focusing on the high-level LLM API. | ||
|
||
## Overview | ||
|
||
TensorRT LLM provides multiple API levels: | ||
|
||
1. **LLM API** - The highest-level API (e.g., the `LLM` class) | ||
2. **PyExecutor API** - The mid-level API (e.g., the `PyExecutor` class) | ||
|
||
This guide focuses on the LLM API, which is the primary interface for most users. | ||
|
||
## API Types and Stability Guarantees | ||
|
||
TensorRT LLM classifies APIs into two categories: | ||
|
||
### 1. Committed APIs | ||
- **Stable** and guaranteed to remain consistent across releases | ||
- No breaking changes without major version updates | ||
- Schema stored in: `tests/unittest/api_stability/references_committed/` | ||
|
||
### 2. Non-committed APIs | ||
- Under active development and may change between releases | ||
- Marked with a `status` field in the docstring: | ||
- `prototype` - Early experimental stage | ||
- `beta` - More stable but still subject to change | ||
- `deprecated` - Scheduled for removal | ||
- Schema stored in: `tests/unittest/api_stability/references/` | ||
- See [API status documentation](https://nvidia.github.io/TensorRT-LLM/llm-api/reference.html) for complete details | ||
|
||
## API Schema Management | ||
|
||
All API schemas are: | ||
- Stored as YAML files in the codebase | ||
- Protected by unit tests in `tests/unittest/api_stability/` | ||
- Automatically validated to ensure consistency | ||
|
||
## Modifying LLM Constructor Arguments | ||
|
||
The LLM class accepts numerous configuration parameters for models, runtime, and other components. These are managed through a Pydantic dataclass called `LlmArgs`. | ||
|
||
### Architecture | ||
|
||
- The LLM's `__init__` method parameters map directly to `LlmArgs` fields | ||
- `LlmArgs` is an alias for `TorchLlmArgs` (defined in `tensorrt_llm/llmapi/llm_args.py`) | ||
- All arguments are validated and type-checked through Pydantic | ||
|
||
### Adding a New Argument | ||
|
||
Follow these steps to add a new constructor argument: | ||
|
||
#### 1. Add the field to `TorchLlmArgs` | ||
|
||
```python | ||
garbage_collection_gen0_threshold: int = Field( | ||
default=20000, | ||
description=( | ||
"Threshold for Python garbage collection of generation 0 objects. " | ||
"Lower values trigger more frequent garbage collection." | ||
), | ||
status="beta" # Required for non-committed arguments | ||
) | ||
``` | ||
Superjomn marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
**Field requirements:** | ||
- **Type annotation**: Required for all fields | ||
- **Default value**: Recommended unless the field is mandatory | ||
- **Description**: Clear explanation of the parameter's purpose | ||
- **Status**: Required for non-committed arguments (`prototype`, `beta`, etc.) | ||
|
||
#### 2. Update the API schema | ||
|
||
Add the field to the appropriate schema file: | ||
|
||
- **Non-committed arguments**: `tests/unittest/api_stability/references/llm_args.yaml` | ||
```yaml | ||
garbage_collection_gen0_threshold: | ||
type: int | ||
default: 20000 | ||
status: beta # Must match the status in code | ||
``` | ||
|
||
- **Committed arguments**: `tests/unittest/api_stability/references_committed/llm_args.yaml` | ||
```yaml | ||
garbage_collection_gen0_threshold: | ||
type: int | ||
default: 20000 | ||
# No status field for committed arguments | ||
``` | ||
|
||
#### 3. Run validation tests | ||
|
||
```bash | ||
python -m pytest tests/unittest/api_stability/test_llm_api.py | ||
``` | ||
|
||
## Modifying LLM Class Methods | ||
|
||
Public methods in the LLM class constitute the API surface. All changes must be properly documented and tracked. | ||
|
||
### Implementation Details | ||
|
||
- The actual implementation is in the `_TorchLLM` class ([llm.py](https://github.com/NVIDIA/TensorRT-LLM/blob/release/1.0/tensorrt_llm/llmapi/llm.py)) | ||
- Public methods (not starting with `_`) are automatically exposed as APIs | ||
|
||
Superjomn marked this conversation as resolved.
Show resolved
Hide resolved
|
||
### Adding a New Method | ||
|
||
Follow these steps to add a new API method: | ||
|
||
#### 1. Implement the method in `_TorchLLM` | ||
|
||
For non-committed APIs, use the `@set_api_status` decorator: | ||
|
||
```python | ||
@set_api_status("beta") | ||
def generate_with_streaming( | ||
self, | ||
prompts: List[str], | ||
**kwargs | ||
) -> Iterator[GenerationOutput]: | ||
"""Generate text with streaming output. | ||
|
||
Args: | ||
prompts: Input prompts for generation | ||
**kwargs: Additional generation parameters | ||
|
||
Returns: | ||
Iterator of generation outputs | ||
""" | ||
# Implementation here | ||
pass | ||
``` | ||
|
||
For committed APIs, no decorator is needed: | ||
|
||
```python | ||
def generate(self, prompts: List[str], **kwargs) -> GenerationOutput: | ||
"""Generate text from prompts.""" | ||
# Implementation here | ||
pass | ||
``` | ||
|
||
#### 2. Update the API schema | ||
|
||
Add the method to the appropriate `llm.yaml` file: | ||
|
||
**Non-committed API** (`tests/unittest/api_stability/references/llm.yaml`): | ||
```yaml | ||
generate_with_streaming: | ||
status: beta # Must match @set_api_status | ||
parameters: | ||
- name: prompts | ||
type: List[str] | ||
- name: kwargs | ||
type: dict | ||
returns: Iterator[GenerationOutput] | ||
``` | ||
|
||
**Committed API** (`tests/unittest/api_stability/references_committed/llm.yaml`): | ||
```yaml | ||
generate: | ||
parameters: | ||
- name: prompts | ||
type: List[str] | ||
- name: kwargs | ||
type: dict | ||
returns: GenerationOutput | ||
``` | ||
|
||
### Modifying Existing Methods | ||
|
||
When modifying existing methods: | ||
|
||
1. **Non-breaking changes** (adding optional parameters): | ||
- Update the method signature | ||
- Update the schema file | ||
- No status change needed | ||
|
||
2. **Breaking changes** (changing required parameters, return types): | ||
- Only allowed for non-committed APIs | ||
- Consider deprecation path for beta APIs | ||
- Update documentation with migration guide | ||
|
||
### Best Practices | ||
|
||
1. **Documentation**: Always include comprehensive docstrings | ||
2. **Type hints**: Use proper type annotations for all parameters and returns | ||
3. **Testing**: Add unit tests for new methods | ||
4. **Examples**: Provide usage examples in the docstring | ||
5. **Validation**: Run API stability tests before submitting changes | ||
|
||
### Running Tests | ||
|
||
Validate your changes: | ||
|
||
```bash | ||
# Run API stability tests | ||
python -m pytest tests/unittest/api_stability/ | ||
|
||
# Run specific test for LLM API | ||
python -m pytest tests/unittest/api_stability/test_llm_api.py -v | ||
``` | ||
|
||
## Common Workflows | ||
|
||
### Promoting an API from Beta to Committed | ||
|
||
1. Remove the `@set_api_status("beta")` decorator from the method | ||
2. Move the schema entry from `tests/unittest/api_stability/references/` to `tests/unittest/api_stability/references_committed/` | ||
3. Remove the `status` field from the schema | ||
4. Update any documentation referring to the API's beta status | ||
|
||
### Deprecating an API | ||
|
||
1. Add `@set_api_status("deprecated")` to the method | ||
2. Update the schema with `status: deprecated` | ||
3. Add deprecation warning in the method: | ||
```python | ||
import warnings | ||
warnings.warn( | ||
"This method is deprecated and will be removed in v2.0. " | ||
"Use new_method() instead.", | ||
DeprecationWarning, | ||
stacklevel=2 | ||
) | ||
``` | ||
4. Document the migration path |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.