[BugFix][Codegen, CUDA] Fix faulty codegen for FP8 #17673

AntonMoberg · 2025-02-24T16:05:40Z

Fixed bug where CUDA codegen produces faulty code when a vectorizable BufferLoadNode contains a Float8 type.

Codegen generated the invalid signature
make___nv_fp8x2_e5m2(param_0[v_.x], param_0[v_.y]) where "param_0" is of type __nv_fp8_e5m2* __restrict__.

This commit adds a missing check is_float8() for CodeGenCUDA::PrintVecElemLoadExpr that is called for vectorizable BufferLoadNodes. Which instead correctly generates the signature _nv_fp8x2_e5m2(make_float2(static_cast<float>(param_0[v_.x], static_cast<float>(param_0[v_.y])))

Additionally this commit removes the added "make_" prefix for float8 in CodeGenCuda::PrintVecConstructor as the correct way to instansiate an nv_fp8x2_[e5m2/e4m3] is through the _nv_fp8x2_[e5m2/e4m3] constructor itself.

tqchen · 2025-02-24T19:35:02Z

thanks @AntonMoberg , @MasterJH5574 would be great if we can validate this PR

MasterJH5574 · 2025-02-25T14:19:10Z

Thank you @AntonMoberg! Would you mind providing an example which can reproduce the error?

AntonMoberg · 2025-02-27T08:53:24Z

Hi @tqchen & @MasterJH5574! I am trying to produce a minimal reproducible example but it is proving a bit challenging as the error only occurs in some specific scenarios. However, during this time I have encountered more faulty Codegen related to FP8. I'll get back to you with updates ASAP :)

AntonMoberg · 2025-02-28T14:35:12Z

I am converting this PR to draft while I work fleshing it out for more cases. Will provide basic tests and suggested fixes along the way!

MasterJH5574 · 2025-03-03T16:21:27Z

Thank you so much @AntonMoberg!

AntonMoberg · 2025-03-10T09:55:19Z

@MasterJH5574 @tqchen This should be ready to be reviewed now.
I am not 100% familiar with this side of the codebase so please make sure I am not making any silly mistakes and that this doesn't break any other things.

Also feel free to make edits if something is fishy :)

AntonMoberg · 2025-03-10T16:04:12Z

There we go, it should now be good to go! Had some rebase issues, so sorry for spamming updates

This commit adds tests for the fp8 codegen & compilation of said code for the most common operators in LLMs and CNNs. Tested operators are: Matmul, Conv2d, Maxpool2d, Add, Relu, Gelu, GeluTanh, Sigmoid, Silu, Softmax

Fixed bug where CUDA codegen produces faulty code when a vectorizable BufferLoadNode contains a Float8 type. Codegen generated the invalid signature "make___nv_fp8x2_e5m2(param_0[v_.x], param_0[v_.y])" where "param_0" is of type "__nv_fp8_e5m2* __restrict__". This commit adds a missing check "is_float8()" for CodeGenCUDA::PrintVecElemLoadExpr that is called for vectorizable BufferLoadNodes. Which instead correctly generates the signature "_nv_fp8x2_e5m2(make_float2(static_cast<float>(param_0[v_.x], static_cast<float>(param_0[v_.y]))) Additionally this commit removes the added "make_" prefix for float8 in CodeGenCuda::PrintVecConstructor as the correct way to instansiate an nv_fp8x2_[e5m2/e4m3] is through the "_nv_fp8x2_[e5m2/e4m3]" constructor itself.

Vectorized FP8 are stored as __nv_[fp8x2/fp8x4]_[e5m2/e4m3] (i.e. 16bit registers). These types do not have overloaded binary operators (such as *) to handle these types. This commit adds the ability to do this by exctracting the high and low bits, statically casting them to floats, performing the operation, then repacking them into dual lane type.

Non-vectorized FP8 are store as __nv_fp8_[e5m2/e4m3] types, these types do not have support for binary operatios because internally FP8 are store in 16bit registers. This commits adds binary operator support by doing the operations in __half instead of fp8 (i.e cast up to 16-bit, then cast down to 8-bit).

Add missing support for unary operators in fp8. FP8 requires you to cast to fp16 to perform mathmatical operations, and this commit handles the casting to and from __half and adds missing checks for the tir intrinsics to generate the correct operator signatures.

AntonMoberg · 2025-03-17T14:44:51Z

Ping! @tqchen @MasterJH5574

MasterJH5574 · 2025-03-17T15:01:15Z

Thank you @AntonMoberg so much for the update! Will take a look!

AntonMoberg force-pushed the main branch from d1495ee to 6daafd1 Compare February 28, 2025 14:32

AntonMoberg marked this pull request as draft February 28, 2025 14:33

AntonMoberg force-pushed the main branch from 6daafd1 to 1c8f666 Compare February 28, 2025 15:25

AntonMoberg force-pushed the main branch 3 times, most recently from 8e6a786 to 6f3f13c Compare March 10, 2025 09:42

AntonMoberg marked this pull request as ready for review March 10, 2025 09:52

AntonMoberg force-pushed the main branch 5 times, most recently from 43b578b to e8d0e04 Compare March 10, 2025 16:03

AntonMoberg added 6 commits March 14, 2025 15:08

Add tests for fp8 operator codegen & compilation

8e3e389

This commit adds tests for the fp8 codegen & compilation of said code for the most common operators in LLMs and CNNs. Tested operators are: Matmul, Conv2d, Maxpool2d, Add, Relu, Gelu, GeluTanh, Sigmoid, Silu, Softmax

Fix build error (FP8 Dtypes mismatch)

a1e2588

AntonMoberg force-pushed the main branch from e8d0e04 to a1e2588 Compare March 14, 2025 14:43

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[BugFix][Codegen, CUDA] Fix faulty codegen for FP8 #17673

[BugFix][Codegen, CUDA] Fix faulty codegen for FP8 #17673

Uh oh!

AntonMoberg commented Feb 24, 2025 •

edited

Loading

Uh oh!

tqchen commented Feb 24, 2025

Uh oh!

MasterJH5574 commented Feb 25, 2025

Uh oh!

AntonMoberg commented Feb 27, 2025

Uh oh!

AntonMoberg commented Feb 28, 2025

Uh oh!

MasterJH5574 commented Mar 3, 2025

Uh oh!

AntonMoberg commented Mar 10, 2025

Uh oh!

AntonMoberg commented Mar 10, 2025

Uh oh!

AntonMoberg commented Mar 17, 2025

Uh oh!

MasterJH5574 commented Mar 17, 2025

Uh oh!

Uh oh!

[BugFix][Codegen, CUDA] Fix faulty codegen for FP8 #17673

Are you sure you want to change the base?

[BugFix][Codegen, CUDA] Fix faulty codegen for FP8 #17673

Uh oh!

Conversation

AntonMoberg commented Feb 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

tqchen commented Feb 24, 2025

Uh oh!

MasterJH5574 commented Feb 25, 2025

Uh oh!

AntonMoberg commented Feb 27, 2025

Uh oh!

AntonMoberg commented Feb 28, 2025

Uh oh!

MasterJH5574 commented Mar 3, 2025

Uh oh!

AntonMoberg commented Mar 10, 2025

Uh oh!

AntonMoberg commented Mar 10, 2025

Uh oh!

AntonMoberg commented Mar 17, 2025

Uh oh!

MasterJH5574 commented Mar 17, 2025

Uh oh!

Uh oh!

AntonMoberg commented Feb 24, 2025 •

edited

Loading