Section 01
Introduction / Main Post: LanteRn: A New Framework for Structured Visual Reasoning in Latent Space
LanteRn enables multimodal models to perform direct visual reasoning in the latent space. By generating continuous visual thought embeddings, it demonstrates more refined visual understanding capabilities on benchmarks such as VisCoT, V*, and Blink.