Section 01
Introduction: RIS-Kernel — A Sparse Attention Inference Engine for Long Texts on Ordinary CPUs
RIS-Kernel is a model-agnostic sparse attention inference engine. It reduces self-attention complexity from O(N²) to O(N log N) using sparse random geometry methods, enabling long-text inference of 65536 tokens on ordinary CPUs without GPU acceleration, thus lowering the hardware threshold for long-text large model applications.