Multi-backend Parallel Support
IntelliM supports parallel existence of multiple backends. Users can maintain Vulkan, SYCL (Intel oneAPI), CUDA, or ROCm versions of llama.cpp builds in the same system and switch via simple command-line parameters. All backends are defined in the builds.conf registry; adding a new backend requires just one line of configuration. This design is suitable for workstations with multiple GPUs from different vendors or scenarios needing to switch between development and production environments.
Intelligent Interactive Mode
Running intellm without parameters enters an interactive wizard guiding users through: 1. Selecting a backend (from registered builds), 2. Choosing a mode (chat, server, bench), 3. Selecting a model (local GGUF or Hugging Face download), 4. Configuring context window (auto-reads model's max context length),5. Selecting KV cache precision (f16, q8_0, q4_0),6. Enabling prompt cache.
Named Configuration System
IntelliM allows saving common configurations as named files in configs/ (key-value format). Users can load presets via intellm --config <name>. default.conf loads automatically if no config is specified; --interactive bypasses default to enter interactive mode.
Persistent Prompt Cache
For repetitive tasks (code assistant, document QA), IntelliM provides persistent prompt cache stored in KVCACHE_DIR (enabled via --prompt-cache), reducing preprocessing time and improving response speed.