Section 01
Miru: Making Multimodal AI Reasoning Processes Transparently Visible (Introduction)
Miru is an open-source multimodal reasoning tracking tool based on FastAPI, designed to solve the "black box" dilemma of multimodal models like GPT-4V and Claude 3. It can generate step-by-step reasoning trajectories, label the image regions or text paragraphs relied on by each reasoning step of the model, and provide an interactive attention visualization feature to enhance the interpretability and credibility of AI systems.