Section 01
[Introduction] Discriminative Hidden State Readout: A New Paradigm for Multimodal Large Model Sentiment Analysis
This paper proposes the Discriminative Hidden State Readout new paradigm, targeting the Multimodal Sentiment Analysis (MSA) task, and proves that it is more accurate and efficient than traditional generative decoding. The study is based on the Tongyi Qianwen native full-modal model Qwen2.5-Omni-7B, using a lightweight regression head to directly predict continuous sentiment scores from the model's hidden states. Combined with 4-bit quantization and QLoRA optimization, it can be trained on consumer-grade GPUs. Key findings: Generative readout has issues such as precision loss and parsing failures, while discriminative readout achieves SOTA performance on the CMU-MOSI/MOSEI datasets. The original paper is from arXiv (June 4, 2026), link: http://arxiv.org/abs/2606.05713v1.