pytorch · LinuxAzzamean · Sep 26, 2024 · Sep 25, 2024 · Sep 25, 2024 · Sep 25, 2024
diff --git a/_posts/2024-09-25-pytorch-native-architecture-optimization.md b/_posts/2024-09-25-pytorch-native-architecture-optimization.md
@@ -1,12 +1,9 @@
 ---
 layout: blog_detail
-title: "CUDA-Free Inference for LLMs"
+title: "PyTorch Native Architecture Optimization: torchao"
 author: Team PyTorch
 ---
 
-# PyTorch Native Architecture Optimization: torchao  
-
-By Team PyTorch
 
 We’re happy to officially launch torchao, a PyTorch native library that makes models faster and smaller by leveraging low bit dtypes, quantization and sparsity. [torchao](https://github.com/pytorch/ao) is an accessible toolkit of techniques written (mostly) in easy to read PyTorch code spanning both inference and training. This blog will help you pick which techniques matter for your workloads.
 
@@ -61,15 +58,11 @@ from torchao.quantization import (
     float8\_dynamic\_activation\_float8\_weight,  
 )
 
-![](/assets/images/Figure_1.png){:style="width:100%"}
 
-<<<<<<< HEAD:_posts/2024-09-25-pytorch-native-architecture-optimization.md
-We also have extensive benchmarks on diffusion models in collaboration with the HuggingFace diffusers team in [diffusers-torchao](https://github.com/sayakpaul/diffusers-torchao) where we demonstrated 53.88% speedup on Flux.1-Dev and 27.33% speedup on CogVideoX-5b 
-=======
+We also have extensive benchmarks on diffusion models in collaboration with the HuggingFace diffusers team in [diffusers-torchao](https://github.com/sayakpaul/diffusers-torchao) where we demonstrated 53.88% speedup on Flux.1-Dev and 27.33% speedup on CogVideoX-5b
+
 ![](/assets/images/Figure_1.png){:style="width:100%"}
 
-We also have extensive benchmarks on diffusion models in collaboration with the HuggingFace diffusers team in [diffusers-torchao](https://github.com/sayakpaul/diffusers-torchao) where we demonstrated 53.88% speedup on Flux.1-Dev and 27.33% speedup on CogVideoX-5b 
->>>>>>> 97898699f7101b847da377106274783ced03bb3d:_posts/2024-09-25-pytorch-native-architecture-optimizaion.md
 
 Our APIs are composable so we’ve for example composed sparsity and quantization to bring 5% [speedup for ViT-H inference](https://github.com/pytorch/ao/tree/main/torchao/sparsity)