Section 01
[Introduction] RAM Coffers: An Innovative Architecture for 8.8x Speedup in CPU-side LLM Inference
RAM Coffers is an open-source project on IBM POWER8 servers. Using technologies like the NUMA distributed weight bank architecture and resonance routing, it achieves CPU-side LLM inference at 147 tokens/sec without a GPU—8.8x faster than standard llama.cpp. This achievement breaks through hardware utilization efficiency and reveals the potential of traditional CPUs in LLM inference.