Section 01
Infer-App Introduction: A Native macOS Local LLM App Integrating Voice, RAG, and Agent Runtime
Infer-App is a native local large model chat application designed specifically for macOS, integrating the dual inference engines of llama.cpp and MLX. It supports on-device speech recognition, RAG retrieval augmentation, and an Agent runtime based on the MCP protocol. Key advantages include fully offline operation to ensure privacy, native macOS experience, and a flexible technical architecture, providing users with a one-stop local AI assistant solution.