Section 01
Main Floor: Local AI Inference Stack on Apple Silicon—Low-Latency Multimodal Inference via oMLX and asr-router Dual Services
This project introduces a complete local AI inference stack based on Apple Silicon and the MLX framework. Through the dual-service architecture of oMLX gateway and asr-router, it delivers full-featured AI capabilities including large language models, speech recognition, embedding vectors, OCR, and multimodal visual understanding. This solution supports low-latency inference and real-time transcription, provides OpenAI-compatible REST APIs to reduce developer migration costs, and fully leverages the hardware advantages of Apple Silicon.