Advantages of HBM
HBM achieves far higher bandwidth than traditional DDR memory through 3D stacking and wide bus architecture, which can effectively alleviate the memory bandwidth bottleneck in AI workloads.
Challenges in Sparse Table Execution
Sparse table execution involves a large number of random accesses to non-zero elements and irregular computations. Traditional dense matrix optimization techniques are difficult to apply directly, requiring specialized design of storage formats, index structures, and computation kernels.
HASTE's Innovative Ideas
- Efficient sparse data layout: Optimize the storage method of sparse tables in HBM to maximize access efficiency
- Parallel execution strategy: Design parallel computing modes suitable for HBM architecture
- Memory access optimization: Reduce performance loss caused by irregular access