diff --git a/_posts/2024-10-28-unleashing-ai-mobile.md b/_posts/2024-10-28-unleashing-ai-mobile.md index 66c487d66147..2e703736337d 100644 --- a/_posts/2024-10-28-unleashing-ai-mobile.md +++ b/_posts/2024-10-28-unleashing-ai-mobile.md @@ -2,6 +2,7 @@ layout: blog_detail title: "Unleashing the Power of AI on Mobile: LLM Inference for Llama 3.2 Quantized Models with ExecuTorch and KleidiAI" author: Gian Marco Iodice, Arm and Digant Desai, Meta +excerpt: "At the recent PyTorch Conference, Arm highlighted the widespread impact of its technology, spanning from cloud to edge, emphasizing its commitment to delivering its advanced AI computing capabilities seamlessly to millions of developers worldwide." --- ## Introduction diff --git a/_posts/2024-11-01-cutlass-ping-pong-gemm-kernel.md b/_posts/2024-11-01-cutlass-ping-pong-gemm-kernel.md index 4bbc3203072f..796ed4ac81e1 100644 --- a/_posts/2024-11-01-cutlass-ping-pong-gemm-kernel.md +++ b/_posts/2024-11-01-cutlass-ping-pong-gemm-kernel.md @@ -2,6 +2,7 @@ layout: blog_detail title: "Deep Dive on CUTLASS Ping-Pong GEMM Kernel" author: Less Wright, Adnan Hoque +excerpt: "In this post, we provide an overview, with relevant FP8 inference kernel benchmarking, of the CUTLASS Ping-Pong GEMM kernel." --- ![Figure 1. FP8 GEMM Throughput Comparison CUTLASS vs Triton](/assets/images/cutlass-ping-pong-gemm-kernel/fg1.png){:style="width:100%"} diff --git a/_posts/2024-11-21-rebellions.md b/_posts/2024-11-21-rebellions.md index ce7ef260441b..b8941a003d83 100644 --- a/_posts/2024-11-21-rebellions.md +++ b/_posts/2024-11-21-rebellions.md @@ -1,6 +1,7 @@ --- layout: blog_detail title: "Rebellions Joins the PyTorch Foundation as a General Member" +excerpt: "The PyTorch Foundation, a neutral home for the deep learning community to collaborate on the open source PyTorch framework and ecosystem, is announcing today that Rebellions has joined as a general member." --- ![Rebellions logo](/assets/images/rebellions-logo.svg){:style="max-width:350px;width:100%;float:right;margin: 20px;"} diff --git a/_posts/2024-11-25-training-using-float8-fsdp2.md b/_posts/2024-11-25-training-using-float8-fsdp2.md index 64d0ac39e6fa..ea81f892f0c7 100644 --- a/_posts/2024-11-25-training-using-float8-fsdp2.md +++ b/_posts/2024-11-25-training-using-float8-fsdp2.md @@ -2,6 +2,7 @@ layout: blog_detail title: "Supercharging Training using float8 and FSDP2" author: "IBM and Meta" +excerpt: "In this blog, we will demonstrate how we achieve up to 50% throughput speedup while achieving loss and evaluation benchmark parity in training over FSDP1 bf16 training" --- **IBM**: Tuan Hoang Trong, Alexei Karve, Yan Koyfman, Linsong Chu, Divya Kumari, Shweta Salaria, Robert Walkup, Praneet Adusumilli, Nirmit Desai, Raghu Ganti, Seetharami Seelam