Section 01
IF4: Adaptive Block Scaling Data Type for Optimized Large Model Quantization (Main Thread)
As large language models grow in size, model compression techniques have become increasingly important. 4-bit quantization has gained attention for balancing compression ratio and model quality. NVIDIA's NVFP4 is one of the mainstream solutions, but it has the problem of excessive quantization error when values are close to the block maximum. The MIT team proposes the IF4 adaptive block scaling data type, which solves this issue by intelligently selecting FP4 and INT4 representations, providing a more efficient solution for large model compression.