Section 01
[Main Post/Introduction] Multimodal Large Language Models Playing Tetris: Benchmark Tests Reveal the True Capabilities of Visual Reasoning
An open-source project called "Models Playing Tetris" systematically evaluates the visual understanding and spatial reasoning capabilities of multimodal large language models (including GPT-4V, Gemini Pro Vision, and LLaVA-13b) by having them play Tetris. It also sets up a $200 prize to incentivize the community to optimize prompt strategies, providing experimental data to understand the current boundaries of AI visual reasoning.