-
Notifications
You must be signed in to change notification settings - Fork 1.8k
[#7308] [feat] AutoDeploy: graph-less transformers mode for HF #7635
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
lucaslie
merged 25 commits into
NVIDIA:main
from
nv-auto-deploy:ll/haoguo/transformers_mode
Sep 18, 2025
Merged
Changes from all commits
Commits
Show all changes
25 commits
Select commit
Hold shift + click to select a range
8663e21
feat: insert cached attn for transformers mode
h-guo18 af41e7e
Add transformers.yaml; load weights for factory_model
h-guo18 4da28dc
minor: clean irrelevant
h-guo18 c8950b5
address part of review comments;
h-guo18 3931bac
address review comments
h-guo18 d8e080a
add sharding; refine insert_cache by adding profiler;
h-guo18 2ed8bb7
polish: use list instead of ptr for shape collection
h-guo18 e796d40
feat: flexible cached attn for transformers mode
h-guo18 37a74c1
configurable default yaml or mode field substitung default yaml
lucaslie 450c44f
transformers mode refined
lucaslie f3d0448
transformers mode refined
lucaslie 0b46c8c
transformers mode refined
lucaslie e74aeaf
VLM debugging
lucaslie 6864333
better model_kwargs and from_pretrained init
lucaslie 5c02398
transformers+graph refined with args/kwargs handling
lucaslie 975d023
config fixes
lucaslie 5dcba38
reviewer feedback and unit tests
lucaslie 6bc7a66
more reviewer feedback
lucaslie b72f41d
Merge branch 'main' into ll/haoguo/transformers_mode
lucaslie a92e160
correct handling of mistral3 factory
lucaslie d20ff7e
Merge branch 'main' into ll/haoguo/transformers_mode
lucaslie f6a35f3
Merge remote-tracking branch 'upstream/main' into ll/haoguo/transform…
lucaslie e724615
Merge remote-tracking branch 'upstream/main' into ll/haoguo/transform…
lucaslie 59aaab6
unit test skip fix
lucaslie 948727b
Merge branch 'main' into ll/haoguo/transformers_mode
lucaslie File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,33 @@ | ||
# This is the set of transforms running in "transformers" mode. In this mode, we hook into the | ||
# HF attention mechanism and replace it with our custom cached attention mechanism. | ||
transforms: | ||
############################################################################################ | ||
# BUILD MODEL, LOAD WEIGHTS, AND WRAP IT INTO FAKE GRAPH MODULE | ||
############################################################################################ | ||
build_and_load_factory_model: | ||
stage: factory | ||
use_strict_forward: false | ||
############################################################################################ | ||
# MOVE ARGUMENTS TO DEVICE | ||
############################################################################################ | ||
move_inputs_to_device: | ||
stage: weight_load | ||
############################################################################################ | ||
# SWITCH TO CACHED+FLATTENED ATTENTION + INITIALIZE CACHES | ||
############################################################################################ | ||
detect_hf_attn_layers: | ||
stage: cache_init | ||
transformers_replace_cached_attn: | ||
stage: cache_init | ||
attn_backend: flashinfer | ||
initialize_cache: | ||
stage: cache_init | ||
resize_kv_cache: | ||
stage: cache_init | ||
args_only: false # use kwargs instead of args | ||
############################################################################################ | ||
# COMPILE MODEL | ||
############################################################################################ | ||
forward_with_cached_sequence_interface: | ||
stage: compile | ||
args_only: false # use kwargs instead of args | ||
lucaslie marked this conversation as resolved.
Show resolved
Hide resolved
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.