Skip to content

Commit a1595a3

Browse files
Andrew (#1749)
* space addition * add image 3, code color test * finished code coloring
1 parent 8dcec6e commit a1595a3

File tree

1 file changed

+4
-0
lines changed

1 file changed

+4
-0
lines changed

_posts/2024-09-26-pytorch-native-architecture-optimization.md

+4
Original file line numberDiff line numberDiff line change
@@ -32,15 +32,19 @@ Below we'll walk through some of the techniques available in torchao you can app
3232

3333
[Our inference quantization algorithms](https://github.com/pytorch/ao/tree/main/torchao/quantization) work over arbitrary PyTorch models that contain nn.Linear layers. Weight only and dynamic activation quantization for various dtypes and sparse layouts can be chosen using our top level quantize\_ api
3434

35+
```py
3536
from torchao.quantization import (
3637
quantize\_,
3738
int4\_weight\_only,
3839
)
3940
quantize\_(model, int4\_weight\_only())
41+
```
4042

4143
Sometimes quantizing a layer can make it slower because of overhead so if you’d rather we just pick how to quantize each layer in a model for you then you can instead run
4244

45+
```py
4346
model \= torchao.autoquant(torch.compile(model, mode='max-autotune'))
47+
```
4448

4549
quantize\_ API has a few different options depending on whether your model is compute bound or memory bound.
4650

0 commit comments

Comments
 (0)