LocalLens's technology stack is built around three core capabilities, all implemented via the Kronk SDK:
Local Runtime Initialization
Kronk is responsible for initializing the llama.cpp runtime and automatically detecting available processor backends. LocalLens supports multiple backend options, including CPU, CUDA, Vulkan, Metal, etc. On first launch, the application can automatically download the required llama.cpp library files, simplifying the user configuration process.
Model Management
Kronk handles downloading and managing local model files. LocalLens currently uses two models that work together:
- Visual Language Model: Used to generate text descriptions of images
- Embedding Model: Converts text descriptions and search queries into searchable vectors
This dual-model architecture ensures accurate understanding of image content and efficient semantic matching.
Local Inference Execution
All inference processes are completed on the user's device. After Kronk loads the models, it converts images into text descriptions via its visual API, then converts these descriptions and the user's search query into vector form via the embedding API. This design ensures data never leaves the user's machine, achieving true privacy protection.