Provide the 8-bits floats (FP8
) proposed in FP8 Formats for Deep Learning (Float8_E4M3FN
, Float8_E5M2
) and 8-bit Numerical Formats For Deep Neural Networks (Float8_E4M3FNUZ
, Float8_E5M2FNUZ
). Mainly for handling data stored in this format. All floating-point arithmetics are with Float32
and convert the output back to FP8
.
-
Notifications
You must be signed in to change notification settings - Fork 0
License
chengchingwen/DLFP8Types.jl
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
About
No description, website, or topics provided.
Resources
License
Stars
Watchers
Forks
Packages 0
No packages published