Section 01
HOI-MLLM Project Overview: Open-World Human-Object Interaction Detection Driven by Multimodal Large Language Models
The HOI-MLLM project innovatively combines multimodal large language models (MLLMs) with chain-of-thought (CoT) reasoning to achieve open-world human-object interaction (HOI) detection, breaking through the limitations of traditional closed-set approaches and opening up new paths for visual understanding. Through a generative paradigm and interpretable reasoning mechanisms, the project addresses the problem of infinite interaction types in the real world.