Section 01
Introduction: Execution via Routing—A New Compiler Paradigm for Replacing Costly Inference with Small Models
An innovative compiler technology that allows small models to execute complex algorithms using precomputed routing paths, drastically cutting LLM inference costs while preserving execution accuracy. Addressing the high cost of large model inference, this paradigm proposes a shift from "inference" to "routing", enabling small models to efficiently perform complex tasks by selecting precomputed paths—combining cost advantages with accuracy.