Reading

Detecting Social Stereotypes in Artworks of the Prado Museum Using Multimodal Large Models

多模态大模型刻板印象检测计算人文艺术史普拉多博物馆SADCAT计算机视觉文化遗产数字化

Published 2026-04-02 05:13Recent activity 2026-04-02 05:21Estimated read 7 min

Section 01

[Introduction] Detecting Social Stereotypes in Artworks of the Prado Museum Using Multimodal Large Models

This article introduces a computational framework combining multimodal large language models and the SADCAT dictionary scoring system to automatically detect social stereotypes in artworks of the Prado Museum, providing new ideas for the digital analysis and ethical review of cultural heritage. This study aims to address the problem that traditional art analysis struggles to systematically identify and quantify social stereotypes in collections, and realizes automated analysis of large-scale visual data through computational humanities methods.

Section 02

Research Background and Motivation

Art history research has long relied on the subjective interpretation of humanities scholars, making it difficult to systematically identify and quantify social stereotypes related to gender, race, social class, etc., hidden in museum collections. As an important art museum, the Prado Museum's cross-century collections carry rich cultural information and may also reflect biases from historical periods. Traditional methods struggle to handle large-scale visual data, and the emergence of multimodal large language models provides a new path to address this challenge. Combining computer vision and natural language processing technologies enables automated and scalable content analysis of artworks.

Section 03

Technical Framework and Analysis of the SADCAT Scoring Mechanism

The core computational process of this project includes three components: multimodal large language models, the SADCAT dictionary scoring system, and the theoretical framework of the Stereotype Content Model. Multimodal models (such as BLIP-2, LLaVA, DeepSeek) extract visual features to generate descriptive text, converting visual information into linguistic representations. The SADCAT system, based on the Stereotype Content Model, is divided into the dimensions of Warmth and Competence. It identifies relevant vocabulary and calculates scores through a multidisciplinary dictionary, considering term frequency, grammatical roles, and semantic weights to achieve fine-grained analysis.

Section 04

Data Processing Flow and Experimental Design Verification

Data processing is divided into six stages: data auditing (cleaning and preprocessing collection metadata), three parallel model inference pipelines (BLIP-2, LLaVA, DeepSeek generate image descriptions), LLaVA model verification (comparing with manual annotations to evaluate accuracy), comprehensive data analysis (integrating model outputs and applying SADCAT scoring), and museum-level macro analysis. The experiment adopts multiple verification strategies: comparing outputs of different models, manually reviewing samples to calculate consistency coefficients, cross-validation to evaluate stability, and designing controlled experiments to study the impact of prompt words on model outputs.

Section 05

Application Value and Ethical Considerations

This study has academic and social value: it demonstrates the potential of computational humanities in art history research, opening up new paths for large-scale visual culture analysis; it provides data support for museum curation and public education, helping to reveal and reflect on implicit biases in cultural heritage. At the same time, there are ethical challenges: automated detection may have false positives or false negatives, and the black-box nature of algorithms hides the basis for judgments. Therefore, it emphasizes human-machine collaboration—AI tools assist rather than replace researchers, and machine results need to be manually reviewed and interpreted by professional scholars.

Section 06

Future Development Directions

The open-source implementation of the project provides a reusable technical foundation for related research. In the future, it can be extended to other museums and types of artworks, develop more refined stereotype classification systems, explore more multimodal model architectures, and establish larger-scale manually annotated datasets. It can also be applied to the analysis of contemporary art creation, combining audience behavior data to study differences in stereotype perception. With the progress of multimodal AI, computational humanities will usher in more breakthroughs.

Continue Reading

Keep going with more reads from the same topic.

Nornir MCP Server: An Enterprise-Grade Bridge for Integrating Large Language Models into Network Automation

Nornir MCP Server is an enterprise-level server based on the Model Context Protocol (MCP). It seamlessly integrates large language models (such as Claude) with the Nornir network automation framework, supporting natural language orchestration for multi-vendor network devices (Cisco, Arista, Juniper, etc.), and providing production-grade features like a dual-engine architecture (NAPALM + Netmiko), intelligent filtering, and a secure sandbox.

Recent activity 2026-05-06 20:51

Bibliothèque Française LLM: A French Public Domain Literature Index System Optimized for Large Language Models

Bibliothèque Française LLM is a structured indexing and annotation project for French public domain literature designed specifically for large language models (LLMs). It integrates multiple authoritative sources such as DraCor, Common Corpus, and Wikisource, providing metadata indexing categorized by genre, author, and era, as well as in-depth annotations for dramatic texts (including characters, lines, stage directions, etc.). Its aim is to enable LLMs to efficiently read and understand classic French literary works.

Recent activity 2026-05-06 20:50

Splinter: A Lock-Free Zero-Copy Shared Memory KV and Vector Storage Library That Eliminates Socket and Memcpy Overhead for LLM Inference

Splinter is a minimalist, high-performance key-value (KV) and vector storage system enabling zero-latency inter-process communication via shared memory and atomic operations. With only 766 lines of core code, it supports millions of operations per second and 768-dimensional vector storage, offering a new architectural approach for local LLM inference and data-intensive applications.

Recent activity 2026-04-03 08:49

Folkering OS: When the Operating System Itself Is AI—A Self-Evolving Bare-Metal Rust System

Folkering OS is the world's first AI-native bare-metal operating system, entirely written in Rust no_std without relying on Linux, POSIX, or libc. It can generate commands from scratch, compile them into WASM, and run them in 10 seconds, achieving true self-evolution.

Recent activity 2026-04-09 16:15