Zing Forum

Reading

EXIST 2026: A Multimodal Sexism Detection System Integrating Eye-Tracking, Heart Rate, and EEG Signals

A People-Centered Multimodal Sexism Detection Study Combining Eye-Tracking, Heart Rate Monitoring, EEG, and Vision-Language Models

多模态学习性别歧视检测眼动追踪EEG心率监测内容审核TikTok视觉语言模型AI安全
Published 2026-06-06 17:27Recent activity 2026-06-06 17:57Estimated read 9 min
EXIST 2026: A Multimodal Sexism Detection System Integrating Eye-Tracking, Heart Rate, and EEG Signals
1

Section 01

[Introduction] EXIST2026: A Multimodal Sexism Detection System Integrating Physiological Signals and Vision-Language Models

Project Overview

  • Original Author/Maintainer: ivanarcos02
  • Source Platform: GitHub
  • Release Time: June 2026
  • Core Direction: EXIST 2026 Challenge on "People-Centered Multimodal Sexism Detection"

Core Innovation

Combines eye-tracking, heart rate monitoring, EEG signals, and Vision-Language Models (VLM) to build a multimodal system. It uses human physiological and cognitive responses to assist in sexism content detection, breaking through the limitations of traditional text/image analysis.

Application Scenarios

Applicable to content moderation on social media platforms like TikTok, exploring a new "people-centered" paradigm in AI safety.

2

Section 02

Research Background and Problem Definition

Limitations of Traditional Detection

Traditional sexism detection relies on text analysis or image recognition, ignoring the real physiological and cognitive responses when humans perceive discriminatory content.

Background of EXIST Challenge

EXIST (Sexism Identification in Social Networks) is an IberLEF series evaluation task. The 2026 direction focuses on "people-centered multimodal detection", with the core hypothesis: Humans produce measurable physiological responses when viewing discriminatory content, which can be used as detection signals.

Core Problem

How to integrate physiological signals with AI models to achieve more accurate sexism detection that aligns with human feelings?

3

Section 03

Core Innovation: Integration of Multimodal Physiological Signals and VLM

Physiological Signal Collection

  1. Eye-Tracking: Analyze fixation distribution, saccade paths, pupil changes, and regression behavior to reflect attention allocation and emotional arousal.
  2. Heart Rate Monitoring: Capture autonomic nervous system responses via heart rate variability (HRV), heart rate acceleration, and temporal correlation.
  3. EEG: Extract event-related potentials (ERP), spectral features, and brain region activation to directly measure neural activity.

Vision-Language Model (VLM)

Integrates models like CLIP/BLIP to achieve video frame understanding, cross-modal alignment, context modeling, and extract visual-semantic features.

Fusion Logic

Combine physiological signals with VLM features to build a multimodal detection system, compensating for the shortcomings of single modalities.

4

Section 04

Analysis of Technical Implementation Architecture

Preprocessing Pipeline

  • Time Synchronization: Align physiological signals with the video timeline
  • Signal Filtering: Remove noise and artifacts
  • Feature Extraction: Extract valid features from raw signals
  • Data Cleaning: Handle missing values and outliers

Prompt Engineering

Design prompt templates to clarify task definition, fine-grained labels (e.g., direct discrimination, micro-discrimination), and context information utilization.

Experimental Configuration

Provide hyperparameter settings, training strategies, and evaluation metrics suitable for sexism detection.

5

Section 05

Scientific Value and Practical Significance

Methodological Innovation

First large-scale application of physiological signals in social media content moderation, pioneering a new "people-centered" AI safety research paradigm that can be extended to other harmful content identification.

Theoretical Contribution

Explore the neurophysiological mechanisms of human perception of sexism, population differences, and consistency between subjective reports and objective indicators.

Practical Value

  • Identify gray-area content
  • Understand the reasons for user discomfort
  • Optimize content recommendation algorithms
6

Section 06

Technical Challenges and Solutions

Data Alignment Difficulty

Different modalities have large sampling rate differences (video: 30fps / heart rate: 1Hz / EEG: 1000Hz). Solution: Sliding window + interpolation technology to unify the time grid.

Individual Differences

Physiological responses vary by person. Solution: Individual normalization + transfer learning to balance generalization and individual differences.

Data Sparsity

Labeled physiological data is scarce. Solution: Semi-supervised learning + data augmentation to make full use of limited data.

7

Section 07

Ethical Considerations and Data Privacy

Informed Consent

Subjects must fully understand the experiment's purpose (including possible exposure to uncomfortable content) and participate voluntarily.

Data Privacy

Physiological data (especially EEG) is highly identifiable, requiring strict protective measures.

Research Ethics

Balance research value with participants' psychological impact, and set up psychological support mechanisms.

Application Ethics

Alert to technical abuse (e.g., emotional manipulation, improper moderation) and clarify the boundaries of legitimate use.

8

Section 08

Future Directions and Summary

Future Development

  1. Expand Modalities: Add galvanic skin response (GSR), facial expression recognition, and speech emotion analysis
  2. Real-Time Detection: Develop an instant moderation system for live content
  3. Cross-Platform/Cultural: Verify the generalization of the method on other platforms and in different cultural contexts

Summary

This research breaks through the traditional content moderation paradigm by integrating physiological signals with AI, enabling the system to better understand human feelings. It provides a new direction for AI safety and content platform moderation, and the technology can be extended to mental health, education, and other fields with broad prospects.