Zing Forum

Reading

Detecting Social Stereotypes in Artworks of the Prado Museum Using Multimodal Large Models

This article introduces a computational framework combining multimodal large language models and the SADCAT dictionary scoring system to automatically detect social stereotypes in artworks of the Prado Museum, providing new ideas for the digital analysis and ethical review of cultural heritage.

多模态大模型刻板印象检测计算人文艺术史普拉多博物馆SADCAT计算机视觉文化遗产数字化
Published 2026-04-02 05:13Recent activity 2026-04-02 05:21Estimated read 7 min
Detecting Social Stereotypes in Artworks of the Prado Museum Using Multimodal Large Models
1

Section 01

[Introduction] Detecting Social Stereotypes in Artworks of the Prado Museum Using Multimodal Large Models

This article introduces a computational framework combining multimodal large language models and the SADCAT dictionary scoring system to automatically detect social stereotypes in artworks of the Prado Museum, providing new ideas for the digital analysis and ethical review of cultural heritage. This study aims to address the problem that traditional art analysis struggles to systematically identify and quantify social stereotypes in collections, and realizes automated analysis of large-scale visual data through computational humanities methods.

2

Section 02

Research Background and Motivation

Art history research has long relied on the subjective interpretation of humanities scholars, making it difficult to systematically identify and quantify social stereotypes related to gender, race, social class, etc., hidden in museum collections. As an important art museum, the Prado Museum's cross-century collections carry rich cultural information and may also reflect biases from historical periods. Traditional methods struggle to handle large-scale visual data, and the emergence of multimodal large language models provides a new path to address this challenge. Combining computer vision and natural language processing technologies enables automated and scalable content analysis of artworks.

3

Section 03

Technical Framework and Analysis of the SADCAT Scoring Mechanism

The core computational process of this project includes three components: multimodal large language models, the SADCAT dictionary scoring system, and the theoretical framework of the Stereotype Content Model. Multimodal models (such as BLIP-2, LLaVA, DeepSeek) extract visual features to generate descriptive text, converting visual information into linguistic representations. The SADCAT system, based on the Stereotype Content Model, is divided into the dimensions of Warmth and Competence. It identifies relevant vocabulary and calculates scores through a multidisciplinary dictionary, considering term frequency, grammatical roles, and semantic weights to achieve fine-grained analysis.

4

Section 04

Data Processing Flow and Experimental Design Verification

Data processing is divided into six stages: data auditing (cleaning and preprocessing collection metadata), three parallel model inference pipelines (BLIP-2, LLaVA, DeepSeek generate image descriptions), LLaVA model verification (comparing with manual annotations to evaluate accuracy), comprehensive data analysis (integrating model outputs and applying SADCAT scoring), and museum-level macro analysis. The experiment adopts multiple verification strategies: comparing outputs of different models, manually reviewing samples to calculate consistency coefficients, cross-validation to evaluate stability, and designing controlled experiments to study the impact of prompt words on model outputs.

5

Section 05

Application Value and Ethical Considerations

This study has academic and social value: it demonstrates the potential of computational humanities in art history research, opening up new paths for large-scale visual culture analysis; it provides data support for museum curation and public education, helping to reveal and reflect on implicit biases in cultural heritage. At the same time, there are ethical challenges: automated detection may have false positives or false negatives, and the black-box nature of algorithms hides the basis for judgments. Therefore, it emphasizes human-machine collaboration—AI tools assist rather than replace researchers, and machine results need to be manually reviewed and interpreted by professional scholars.

6

Section 06

Future Development Directions

The open-source implementation of the project provides a reusable technical foundation for related research. In the future, it can be extended to other museums and types of artworks, develop more refined stereotype classification systems, explore more multimodal model architectures, and establish larger-scale manually annotated datasets. It can also be applied to the analysis of contemporary art creation, combining audience behavior data to study differences in stereotype perception. With the progress of multimodal AI, computational humanities will usher in more breakthroughs.