mobile-model-SDK currently supports the following models:
MiniCPM-V 4.6 (1.3B):This is an efficient multimodal model developed by OpenBMB (FaceWall Intelligence), with only 1.3B parameters but excellent performance in visual understanding tasks. It is particularly good at OCR (Optical Character Recognition) and UI understanding, and can accurately recognize text content and interface elements in screenshots. This model supports text and image input but does not support audio.
Gemma 4 E2B / E4B:This is Google's Gemma 4 series model, supporting three modalities: text, image, and audio. The E2B and E4B variants represent different parameter scales respectively. Gemma 4's native audio support allows it to directly process voice input on the device, enabling speech-to-text conversion and voice-based Q&A.
Notably, the SDK adopts a model-agnostic design architecture. Developers can load any supported GGUF format model, and the SDK will automatically detect the model's capabilities (visual, audio support) and apply the correct conversation template. Adding a new model usually does not require code modification—just place the corresponding GGUF file and mmproj file.