Spindll is a GGUF and MLX format model inference engine natively written in Rust, open-sourced by developer Iito. The project name combines the concepts of "Spindle" and "LL(ama)", reflecting its positioning as a model management and service engine.
As a single-binary solution, Spindll can pull models from the Ollama registry or HuggingFace, manage local storage, and provide streaming inference services via gRPC and HTTP protocols. It supports concurrent loading of multiple models, memory-aware scheduling, GPU hardware acceleration, and provides an OpenAI-compatible API interface.
Notably, on Apple Silicon platforms, Spindll natively runs MLX format models via Swift bridging, while supporting GGUF format through llama.cpp, achieving seamless integration of dual backends.