NVIDIA · Superjomn · Sep 22, 2025 · Sep 19, 2025
@@ -0,0 +1,228 @@
+# LLM API Change Guide
+
+This guide explains how to modify and manage APIs in TensorRT LLM, focusing on the high-level LLM API.
+
+## Overview
+
+TensorRT LLM provides multiple API levels:
+
+1. **LLM API** - The highest-level API (e.g., the `LLM` class)
+2. **PyExecutor API** - The mid-level API (e.g., the `PyExecutor` class)
+
+This guide focuses on the LLM API, which is the primary interface for most users.
+
+## API Types and Stability Guarantees
+
+TensorRT LLM classifies APIs into two categories:
+
+### 1. Committed APIs
+- **Stable** and guaranteed to remain consistent across releases
+- No breaking changes without major version updates
+- Schema stored in: `tests/unittest/api_stability/references_committed/`
+
+### 2. Non-committed APIs
+- Under active development and may change between releases
+- Marked with a `status` field in the docstring:
+  - `prototype` - Early experimental stage
+  - `beta` - More stable but still subject to change
+  - `deprecated` - Scheduled for removal
+- Schema stored in: `tests/unittest/api_stability/references/`
+- See [API status documentation](https://nvidia.github.io/TensorRT-LLM/llm-api/reference.html) for complete details
+
+## API Schema Management
+
+All API schemas are:
+- Stored as YAML files in the codebase
+- Protected by unit tests in `tests/unittest/api_stability/`
+- Automatically validated to ensure consistency 
+
+## Modifying LLM Constructor Arguments
+
+The LLM class accepts numerous configuration parameters for models, runtime, and other components. These are managed through a Pydantic dataclass called `LlmArgs`.
+
+### Architecture
+
+- The LLM's `__init__` method parameters map directly to `LlmArgs` fields
+- `LlmArgs` is an alias for `TorchLlmArgs` (defined in `tensorrt_llm/llmapi/llm_args.py`)
+- All arguments are validated and type-checked through Pydantic
+
+### Adding a New Argument
+
+Follow these steps to add a new constructor argument:
+
+#### 1. Add the field to `TorchLlmArgs`
+
+```python
+garbage_collection_gen0_threshold: int = Field(
+    default=20000,
+    description=(
+        "Threshold for Python garbage collection of generation 0 objects. "
+        "Lower values trigger more frequent garbage collection."
+    ),
+    status="beta"  # Required for non-committed arguments
+)
+```
+
+**Field requirements:**
+- **Type annotation**: Required for all fields
+- **Default value**: Recommended unless the field is mandatory
+- **Description**: Clear explanation of the parameter's purpose
+- **Status**: Required for non-committed arguments (`prototype`, `beta`, etc.)
+
+#### 2. Update the API schema
+
+Add the field to the appropriate schema file:
+
+- **Non-committed arguments**: `tests/unittest/api_stability/references/llm_args.yaml`
+  ```yaml
+  garbage_collection_gen0_threshold:
+    type: int
+    default: 20000
+    status: beta  # Must match the status in code
+  ```
+
+- **Committed arguments**: `tests/unittest/api_stability/references_committed/llm_args.yaml`
+  ```yaml
+  garbage_collection_gen0_threshold:
+    type: int
+    default: 20000
+    # No status field for committed arguments
+  ```
+
+#### 3. Run validation tests
+
+```bash
+python -m pytest tests/unittest/api_stability/test_llm_api.py
+```
+
+## Modifying LLM Class Methods
+
+Public methods in the LLM class constitute the API surface. All changes must be properly documented and tracked.
+
+### Implementation Details
+
+- The actual implementation is in the `_TorchLLM` class ([llm.py](https://github.com/NVIDIA/TensorRT-LLM/blob/release/1.0/tensorrt_llm/llmapi/llm.py))
+- Public methods (not starting with `_`) are automatically exposed as APIs
+
+### Adding a New Method
+
+Follow these steps to add a new API method:
+
+#### 1. Implement the method in `_TorchLLM`
+
+For non-committed APIs, use the `@set_api_status` decorator:
+
+```python
+@set_api_status("beta")
+def generate_with_streaming(
+    self, 
+    prompts: List[str], 
+    **kwargs
+) -> Iterator[GenerationOutput]:
+    """Generate text with streaming output.
+
+    Args:
+        prompts: Input prompts for generation
+        **kwargs: Additional generation parameters
+
+    Returns:
+        Iterator of generation outputs
+    """
+    # Implementation here
+    pass
+```
+
+For committed APIs, no decorator is needed:
+
+```python
+def generate(self, prompts: List[str], **kwargs) -> GenerationOutput:
+    """Generate text from prompts."""
+    # Implementation here
+    pass
+```
+
+#### 2. Update the API schema
+
+Add the method to the appropriate `llm.yaml` file:
+
+**Non-committed API** (`tests/unittest/api_stability/references/llm.yaml`):
+```yaml
+generate_with_streaming:
+  status: beta  # Must match @set_api_status
+  parameters:
+    - name: prompts
+      type: List[str]
+    - name: kwargs
+      type: dict
+  returns: Iterator[GenerationOutput]
+```
+
+**Committed API** (`tests/unittest/api_stability/references_committed/llm.yaml`):
+```yaml
+generate:
+  parameters:
+    - name: prompts
+      type: List[str]
+    - name: kwargs
+      type: dict
+  returns: GenerationOutput
+```
+
+### Modifying Existing Methods
+
+When modifying existing methods:
+
+1. **Non-breaking changes** (adding optional parameters):
+   - Update the method signature
+   - Update the schema file
+   - No status change needed
+
+2. **Breaking changes** (changing required parameters, return types):
+   - Only allowed for non-committed APIs
+   - Consider deprecation path for beta APIs
+   - Update documentation with migration guide
+
+### Best Practices
+
+1. **Documentation**: Always include comprehensive docstrings
+2. **Type hints**: Use proper type annotations for all parameters and returns
+3. **Testing**: Add unit tests for new methods
+4. **Examples**: Provide usage examples in the docstring
+5. **Validation**: Run API stability tests before submitting changes
+
+### Running Tests
+
+Validate your changes:
+
+```bash
+# Run API stability tests
+python -m pytest tests/unittest/api_stability/
+
+# Run specific test for LLM API
+python -m pytest tests/unittest/api_stability/test_llm_api.py -v
+```
+
+## Common Workflows
+
+### Promoting an API from Beta to Committed
+
+1. Remove the `@set_api_status("beta")` decorator from the method
+2. Move the schema entry from `tests/unittest/api_stability/references/` to `tests/unittest/api_stability/references_committed/`
+3. Remove the `status` field from the schema
+4. Update any documentation referring to the API's beta status
+
+### Deprecating an API
+
+1. Add `@set_api_status("deprecated")` to the method
+2. Update the schema with `status: deprecated`
+3. Add deprecation warning in the method:
+   ```python
+   import warnings
+   warnings.warn(
+       "This method is deprecated and will be removed in v2.0. "
+       "Use new_method() instead.",
+       DeprecationWarning,
+       stacklevel=2
+   )
+   ```
+4. Document the migration path
@@ -83,6 +83,7 @@ Welcome to TensorRT LLM's Documentation!
    developer-guide/perf-benchmarking.md
    developer-guide/ci-overview.md
    developer-guide/dev-containers.md
+   developer-guide/api-change.md
 
 
 .. .. toctree::