Section 01
Tattletale: Guide to the High-Performance Cross-Platform Multimodal LLM Inference Engine
Tattletale is a high-performance multimodal LLM inference engine developed in Nim, aiming to break the contradiction between performance and portability in the field of large language model inference. It supports multiple backends such as CUDA, OpenCL, Vulkan, and WebGPU, and features an innovative IntrusiveAttention cache mechanism, EXL3 quantization support, and Lean4 formal verification. Its goal is to achieve both high-performance inference and true cross-platform compatibility.