Section 01
Introduction: ONNX Runtime GenAI—Cross-Platform Large Language Model Inference Engine and Edge Deployment Solution
Microsoft's open-source ONNX Runtime GenAI is a system-level solution addressing the challenges of large language model inference performance and deployment flexibility. Built on the mature ONNX Runtime, it provides a full-stack generative AI loop implementation (including preprocessing/postprocessing, KV caching, constrained decoding, etc.), supports cross-platform deployment and multi-hardware acceleration, enabling developers to run large models efficiently on consumer devices and focus on application-layer innovation.