Pulse · pytorch/ao · GitHub

June 12, 2025 – June 19, 2025

Overview

46 Active pull requests

9 Active issues

29 Pull requests merged by 17 people

Enable cpp kernel building
#2402 merged Jun 19, 2025
Replace debug handle with from_node to trace operator transformation
#2339 merged Jun 18, 2025
[float8 moe training] make using triton kernels for per-group scaling configurable
#2405 merged Jun 18, 2025
Add part 2 of end-to-end tutorial: fine-tuning
#2394 merged Jun 18, 2025
Fix ruff broken on main
#2404 merged Jun 18, 2025
fix torchao quantized model in fbcode
#2396 merged Jun 18, 2025
[BE] Convert quant_primitives methods private
#2350 merged Jun 18, 2025
Delete Galore
#2397 merged Jun 18, 2025
Add inplace quantizer examples
#2345 merged Jun 18, 2025
Update index.rst
#2395 merged Jun 17, 2025
Add pt2e tutorials to torchao doc page
#2384 merged Jun 17, 2025
deduplicate torch ao debugger tests between pytorch/ao and ExecuTorch
#2390 merged Jun 17, 2025
[float8 training] update torchtitan benchmark script args
#2392 merged Jun 17, 2025
turn off building tests with cpuinfo
#2324 merged Jun 17, 2025
remove torchao dependency from torchao build script
#2383 merged Jun 17, 2025
remove rocm source files when not building for rocm
#2385 merged Jun 16, 2025
Revamp README
#2374 merged Jun 16, 2025
Test PARQ with torchao activation quantization
#2370 merged Jun 16, 2025
[sparse] remove superblock
#2381 merged Jun 16, 2025
Skip a couple tests to unbreak CI
#2382 merged Jun 16, 2025
[docs] Replace deprecated configs with Config objects
#2375 merged Jun 16, 2025
Add test case generator for groupwise low bit LUT based quantization
#2359 merged Jun 13, 2025
[ci] fix pt2e x86 unit tests
#2371 merged Jun 13, 2025
Add dynamic quantization support to gemlite layout
#2327 merged Jun 13, 2025
make float8 training's force_recompute_fp8_weight_in_bwd flag do nothing
#2356 merged Jun 13, 2025
fixing trunk - autoquant test failure
#2363 merged Jun 12, 2025
fixing ruff format for trunk
#2369 merged Jun 12, 2025
[float8] Add fnuz fp8 dtypes to Float8Layout
#2351 merged Jun 12, 2025
Fix 2:4 sparsify meta registrations
#2366 merged Jun 12, 2025

17 Pull requests opened by 14 people

Gemlite generate.py fix
#2372 opened Jun 13, 2025
Testing claude documentation
#2373 opened Jun 13, 2025
[float8] Prevent quantize_affine_float8/dequantize_affine_float8 decomposed on inductor
#2379 opened Jun 16, 2025
[CPU INT8 SDPA] use manual transpose and pack
#2380 opened Jun 16, 2025
[not for land] float8 blockwise scaling training prototype using deep_gemm
#2386 opened Jun 16, 2025
Add support for resharding and int4 preshuffle kernel
#2387 opened Jun 16, 2025
Enables the per_tensor lowering patterns for weight per_packing
#2391 opened Jun 17, 2025
Unskip tests
#2398 opened Jun 18, 2025
[WIP] Add AWQ quantization with QDQLayout support for ExecuTorch
#2399 opened Jun 18, 2025
[WIP] Make AWQ more general
#2400 opened Jun 18, 2025
Align scale dtype with model precision in GPTQ
#2403 opened Jun 18, 2025
Improve tiling params to speed up prefill
#2406 opened Jun 18, 2025
Groupwise low bit LUT based model quantization.
#2407 opened Jun 18, 2025
WIP NVfp4
#2408 opened Jun 18, 2025
Fix cache dir import
#2409 opened Jun 18, 2025
[float8] add auto_filter_for_recipe to float8
#2410 opened Jun 18, 2025
[Inductor] Support scaled mm on inductor
#2411 opened Jun 19, 2025

1 Issue closed by 1 person

[Windows][build]two Build failure on Windows on latest main branch
#2297 closed Jun 16, 2025

8 Issues opened by 7 people

TorchAO Paper
#2412 opened Jun 19, 2025
TP + FSDP + MXFP8 fails during compile
#2393 opened Jun 17, 2025
Implement an AWQ algorithm with dynamic activation quantization for ExecuTorch
#2388 opened Jun 16, 2025
Support cpu wheel with cpp kernels
#2378 opened Jun 16, 2025
Question about the choice of use_fast_accum in FP8 Training
#2377 opened Jun 14, 2025
bug: int8 w8a8 doesn't work on 5090
#2376 opened Jun 14, 2025
Remove old subclass APIs are no longer need and are creating a maintenence burden
#2368 opened Jun 12, 2025
TorchAO ROCM tests are taking a long time
#2367 opened Jun 12, 2025

15 Unresolved conversations

Sometimes conversations happen on old items that aren’t yet closed. Here is a list of all the Issues and Pull Requests with unresolved conversations.

Inference tutorial - Part 3 of e2e series [WIP]
#2343 commented on Jun 18, 2025 • 20 new comments
[CPU] Enable DA8W4 on CPU
#2128 commented on Jun 15, 2025 • 10 new comments
Build mxfp4 kernel for sm120a
#2285 commented on Jun 18, 2025 • 9 new comments
skip quant/dequant decomposed
#2299 commented on Jun 19, 2025 • 8 new comments
Add Claude MD file
#2311 commented on Jun 18, 2025 • 5 new comments
Add _apply_fn_to_data method to TorchAOBaseTensor base class
#2365 commented on Jun 13, 2025 • 1 new comment
[roadmap/tracker] Low precision training for MoEs
#2147 commented on Jun 16, 2025 • 0 new comments
BF16 stochastic rounding does not work distributed (FSDP)
#2296 commented on Jun 18, 2025 • 0 new comments
[Question] Combining QAT and Sparsity Training
#2310 commented on Jun 18, 2025 • 0 new comments
[low-bit optim] Add coat for float8 optimizer
#1231 commented on Jun 15, 2025 • 0 new comments
Add round_scales_to_power_of_2 option for float quantization
#2323 commented on Jun 18, 2025 • 0 new comments
moe quant with dedicated kernels [wip]
#2325 commented on Jun 19, 2025 • 0 new comments
[WIP] FSDP support for MoE training
#2357 commented on Jun 19, 2025 • 0 new comments
Update to new PT Theme
#2361 commented on Jun 17, 2025 • 0 new comments
[not for land] checking ROCM test length issue
#2364 commented on Jun 12, 2025 • 0 new comments