Releases · huggingface/text-generation-inference

17 Sep 00:48

drbh

v3.3.6

efb94e0

v3.3.6 Latest

Latest

What's Changed

Add missing backslash by @philsupertramp in #3311
Revert "feat: bump flake including transformers and huggingface_hub versions" by @drbh in #3323
fix: remove azure by @drbh in #3325
Fix mask passed to flashinfer by @danieldk in #3324
Update iframe sources for streaming demo by @coyotte508 in #3327
Revert "Revert "feat: bump flake including transformers and huggingfa… by @drbh in #3326
Revert "feat: bump flake including transformers and huggingface_hub versions" by @drbh in #3330
Patch version 3.3.6 by @tengomucho in #3329

New Contributors

@philsupertramp made their first contribution in #3311
@coyotte508 made their first contribution in #3327

Full Changelog: v3.3.5...v3.3.6

Contributors

danieldk, coyotte508, and 3 other contributors

Assets 2

02 Sep 15:02

tengomucho

v3.3.5

8d029d2

v3.3.5

What's Changed

[gaudi] Refine rope memory, do not need to keep sin/cos cache per layer by @sywangyi in #3274
Gaudi: add CI by @baptistecolle in #3160
[gaudi] Gemma3 sliding window support by @sywangyi in #3280
xpu lora support by @sywangyi in #3232
Optimum neuron 0.2.2 by @dacorvo in #3281
[gaudi] Remove unnecessary reinitialize to HeterogeneousNextTokenChooser to m… by @sywangyi in #3284
[gaudi] Deepseek v2 mla and add ep to unquantized moe by @sywangyi in #3287
[gaudi] Fix the CI test errors by @yuanwu2017 in #3286
Hpu gptq gidx support by @sywangyi in #3297
Migrate to V2 Pydantic interface by @emmanuel-ferdman in #3262
Xccl by @sywangyi in #3252
Multi modality fix by @sywangyi in #3283
some gptq case could not be handled by ipex. but could be handle by t… by @sywangyi in #3298
fix outline import issue by @sywangyi in #3282
HuggingFaceM4/Idefics3-8B-Llama3 crash fix by @sywangyi in #3267
Optimum neuron 0.3.0 by @tengomucho in #3308
Disable Cachix pushes by @danieldk in #3312
chore: prepare version 3.3.5 by @tengomucho in #3314
feat: bump flake including transformers and huggingface_hub versions by @drbh in #3313

Full Changelog: v3.3.4...git

Contributors

danieldk, dacorvo, and 6 other contributors

Assets 2

19 Jun 10:00

dacorvo

v3.3.4

d4bd5ca

v3.3.4

Fix for Neuron models exported with batch_size 1.

What's Changed

[gaudi] gemma3 text and vlm model intial support. need to add sliding window … by @sywangyi in #3270
Neuron backend fix by @dacorvo in #3273

Full Changelog: v3.3.3...v3.3.4

Contributors

dacorvo and sywangyi

Assets 2

18 Jun 13:11

dacorvo

v3.3.3

1754b79

v3.3.3

Neuron backend update.

What's Changed

Remove useless packages by @yuanwu2017 in #3253
Bump neuron SDK version by @dacorvo in #3260
Perf opt by @sywangyi in #3256
[gaudi] Vlm rebase and issue fix in benchmark test by @sywangyi in #3263
Move the _update_cos_sin_cache into get_cos_sin by @yuanwu2017 in #3254
[Gaudi] Remove optimum-habana by @yuanwu2017 in #3261
[gaudi] HuggingFaceM4/idefics2-8b issue fix by @sywangyi in #3264
[Gaudi] Enable Qwen3_moe model by @yuanwu2017 in #3244
[Gaudi]Fix the integration-test issues by @yuanwu2017 in #3265
[Gaudi] use pad_token_id to pad input id by @sywangyi in #3268
chore: prepare release 3.3.3 by @dacorvo in #3269
[gaudi] Refine logging for Gaudi warmup by @regisss in #3222
doc: fix README by @dacorvo in #3271

Full Changelog: v3.3.2...v3.3.3

Contributors

dacorvo, regisss, and 2 other contributors

Assets 2

30 May 14:20

danieldk

v3.3.2

8e41da9

v3.3.2

Gaudi improvements.

What's Changed

upgrade to new vllm extension ops(fix issue in exponential bucketing) by @sywangyi in #3239
Nix: switch to hf-nix by @danieldk in #3240
Add Qwen3 by @yuanwu2017 in #3229
fp8 compressed_tensors w8a8 support by @sywangyi in #3242
[Gaudi] Fix the OOM issue of Llama-4-Scout-17B-16E-Instruct by @yuanwu2017 in #3245
Fix the Llama-4-Maverick-17B-128E crash issue by @yuanwu2017 in #3246
Prepare for 3.3.2 by @danieldk in #3249

Full Changelog: v3.3.1...v3.3.2

Contributors

danieldk, yuanwu2017, and sywangyi

Assets 2

22 May 07:49

danieldk

v3.3.1

767a652

v3.3.1

This release updates TGI to Torch 2.7 and CUDA 12.8.

What's Changed

change HPU warmup logic: seq length should be with exponential growth by @kaixuanliu in #3217
adjust the round_up_seq logic to align with prefill warmup phase on… by @kaixuanliu in #3224
Update to Torch 2.7.0 by @danieldk in #3221
Enable Llama4 for gaudi backend by @yuanwu2017 in #3223
fix: count gpu uuids if NVIDIA_VISIBLE_DEVICES env set to all by @drbh in #3230
Deepseek r1 by @sywangyi in #3211
Refine warmup and upgrade to synapse AI 1.21.0 by @sywangyi in #3234
fix the crash in default ATTENTION path by @sywangyi in #3235
Switch to punica-sgmv kernel from the Hub by @danieldk in #3236
move input_ids to hpu and remove disposal of adapter_meta by @sywangyi in #3237
Prepare for 3.3.1 by @danieldk in #3238

New Contributors

@kaixuanliu made their first contribution in #3217

Full Changelog: v3.3.0...v3.3.1

Contributors

danieldk, drbh, and 3 other contributors

Assets 2

09 May 13:57

danieldk

v3.3.0

03a8b8d

v3.3.0

Notable changes

Prefill chunking for VLMs.

What's Changed

Fixing Qwen 2.5 VL (32B). by @Narsil in #3157
Fixing tokenization like https://github.com/huggingface/text-embeddin… by @Narsil in #3156
Gaudi: clean cuda/rocm code in hpu backend, enable flat_hpu by @sywangyi in #3113
L4 fixes by @mht-sharma in #3161
setuptools <= 70.0 is vulnerable: CVE-2024-6345 by @Narsil in #3171
transformers flash llm/vlm enabling in ipex by @sywangyi in #3152
Upgrading the dependencies in Gaudi backend. by @Narsil in #3170
Hotfixing gaudi deps. by @Narsil in #3174
Hotfix gaudi2 with newer transformers. by @Narsil in #3176
Support flashinfer for Gemma3 prefill by @danieldk in #3167
Get opentelemetry trace id from request headers instead of creating a new trace by @kozistr in #2648
Bump sccache to 0.10.0 by @alvarobartt in #3179
Fixing CI by @Narsil in #3184
Add option to configure prometheus port by @mht-sharma in #3187
Warmup gaudi backend by @sywangyi in #3172
Put more wiggle room. by @Narsil in #3189
Fixing the router + template for Qwen3. by @Narsil in #3200
Skip {% generation %} and {% endgeneration %} template handling by @alvarobartt in #3204
doc typo by @julien-c in #3206
Pr 2982 ci branch by @drbh in #3046
fix: bump snaps for mllama by @drbh in #3202
Update client SDK snippets by @julien-c in #3207
Fix HF_HUB_OFFLINE=1 for Gaudi backend by @regisss in #3193
IPEX support FP8 kvcache/softcap/slidingwindow by @sywangyi in #3144
forward and tokenize chooser use the same shape by @sywangyi in #3196
Chunked Prefill VLM by @mht-sharma in #3188
Prepare for 3.3.0 by @danieldk in #3220

New Contributors

@kozistr made their first contribution in #2648
@julien-c made their first contribution in #3206

Full Changelog: v3.2.3...v3.3.0

Contributors

danieldk, Narsil, and 7 other contributors

Assets 2

08 Apr 08:18

Narsil

v3.2.3

a1f3ebe

v3.2.3

Main changes

Patching Llama 4

What's Changed

Use ROCM 6.3.1 by @mht-sharma in #3141
Update transformers to 4.51 by @mht-sharma in #3148
Gaudi: Add Integration Test for Gaudi Backend by @baptistecolle in #3142
fix: compute type typo by @oOraph in #3150
3.2.3 by @Narsil in #3151

Full Changelog: v3.2.2...v3.2.3

Contributors

Narsil, oOraph, and 2 other contributors

Assets 2

06 Apr 09:41

Narsil

v3.2.2

c67546f

v3.2.2

What's Changed

Minor fixes. by @Narsil in #3125
configurable termination timeout by @ErikKaum in #3126
CI: enable server tests for backends by @baptistecolle in #3128
Torch 2.6 by @Narsil in #3134
Gaudi: Fix llava-next and mllama crash issue by @yuanwu2017 in #3127
nix-v3.2.1 -> v3.2.1-nix by @co42 in #3129
Gaudi: Use exponential growth to replace BATCH_BUCKET_SIZE by @yuanwu2017 in #3131
Add llama4 by @mht-sharma in #3145
Preparing for release. by @Narsil in #3147

New Contributors

@co42 made their first contribution in #3129

Full Changelog: v3.2.1...v3.2.2

Contributors

co42, Narsil, and 4 other contributors

Assets 2

18 Mar 14:28

Narsil

v3.2.1

4d28897

v3.2.1

What's Changed

Update to kernels 0.2.1 by @danieldk in #3084
Router: add gemma3-text model type by @danieldk in #3107
We need gcc during runtime to enable triton to compile kernels. by @Narsil in #3103
Release of Gaudi Backend for TGI by @baptistecolle in #3091
Fixing the docker build. by @Narsil in #3108
Make the Nix-based Docker container work on non-NixOS by @danieldk in #3109
xpu 2.6 update by @sywangyi in #3051
launcher: correctly get the head dimension for VLMs by @danieldk in #3116
Gaudi: Sync TGI with the latest changes from the TGI-Gaudi fork by @baptistecolle in #3117
Bug Fix: Sliding Window Attention by @mht-sharma in #3112
Publish nix docker image. by @Narsil in #3122
Prepare for patch release. by @Narsil in #3124
Intel docker. by @Narsil in #3121

Full Changelog: v3.2.0...v3.2.1

Contributors

danieldk, Narsil, and 3 other contributors

Assets 2

Releases: huggingface/text-generation-inference

v3.3.6

What's Changed

New Contributors

Contributors

Uh oh!

v3.3.5

What's Changed

Contributors

Uh oh!

v3.3.4

What's Changed

Contributors

Uh oh!

v3.3.3

What's Changed

Contributors

Uh oh!

v3.3.2

What's Changed

Contributors

Uh oh!

v3.3.1

What's Changed

New Contributors

Contributors

Uh oh!

v3.3.0

Notable changes

What's Changed

New Contributors

Contributors

Uh oh!

v3.2.3

Main changes

What's Changed

Contributors

Uh oh!

v3.2.2

What's Changed

New Contributors

Contributors

Uh oh!

v3.2.1

What's Changed

Contributors

Uh oh!