Skip to content

Commit 972f781

Browse files
authored
Update graphs for h100
1 parent 0537aa8 commit 972f781

File tree

3 files changed

+10
-0
lines changed

3 files changed

+10
-0
lines changed

_posts/2024-08-07-flexattention.md

+10
Original file line numberDiff line numberDiff line change
@@ -439,6 +439,16 @@ FlexAttention achieves 90% of FlashAttention2's performance in the forward pass
439439

440440
![flexattention speed chart](/assets/images/flexattention/fg16.png){:style="width:100%"}
441441

442+
FlexAttention shines on H100 GPUs, where it's not just natively supported - it actually outperforms FlashAttention2! While it doesn't quite reach the heights of FlashAttention3, FlexAttention still packs a punch:
443+
444+
- Forward pass: 85% of FlashAttention3's performance
445+
- Backward pass: 76% of FlashAttention3's performance
446+
447+
![flexattention speed chart](/assets/images/flexattention/fg17.png){:style="width:100%"}
448+
![flexattention speed chart](/assets/images/flexattention/fg18.png){:style="width:100%"}
449+
450+
451+
442452
## Conclusion
443453

444454
We hope you have as much fun using FlexAttention as we did developing it\! While working on this, we ended up finding way more applications of this API than we could have expected. We’ve already seen it accelerate torchtune’s [sample packing throughput by 71%](https://github.com/pytorch/torchtune/pull/1193), replace the need for a researcher to spend over a week writing their own custom Triton kernel, and deliver competitive performance with custom handwritten attention variants.

assets/images/flexattention/fg17.png

212 KB
Loading

assets/images/flexattention/fg18.png

211 KB
Loading

0 commit comments

Comments
 (0)