Zing Forum

Reading

Multimodal Trolley Problem: Exploring Moral Biases and Alignment Issues in Large Language Models

A study based on the classic Moral Machine experimental framework that tests whether Claude, GPT-4.1, and Gemini exhibit demographic biases when making moral decisions in multimodal scenarios.

LLMAI alignmentmoral biasmultimodaltrolley problemFairFaceautonomous vehiclesethicsClaudeGPT-4
Published 2026-04-29 06:59Recent activity 2026-04-29 10:03Estimated read 9 min
Multimodal Trolley Problem: Exploring Moral Biases and Alignment Issues in Large Language Models
1

Section 01

Introduction: Multimodal Trolley Problem Research—Exploring Moral Biases and Alignment Issues in LLMs

This study is based on the classic Moral Machine experimental framework and tests whether three mainstream large language models (LLMs)—Claude, GPT-4.1, and Gemini—exhibit demographic biases when making moral decisions in multimodal scenarios. Using a rigorous design that includes dual experimental arms (text and image) and mirrored pairing controls, the study explores core issues of AI value alignment through open-source methods, providing references for the ethical safety of LLM applications in high-risk domains.

2

Section 02

Research Background: Ethical Dilemmas in Autonomous Driving and LLM Bias Issues

The classic ethical dilemma faced by autonomous vehicles is a variant of the 'trolley problem'—when brakes fail, which group of pedestrians should the vehicle hit? This touches on the core of AI value alignment. MIT's 2018 Moral Machine experiment revealed differences in people's moral preferences regarding factors like age and gender across different cultures. Now that LLMs are integrated into safety-critical systems, urgent questions arise: Do these models internalize demographic biases? Are decisions consistent between text descriptions and real face images? This study aims to answer these questions.

3

Section 03

Research Design and Methodology: Rigorous Experimental Framework and Controls

Experimental Framework

  • Three-model comparison: Test Claude (claude-sonnet-4-6), GPT-4.1, Gemini (gemini-2.5-flash).
  • Dual-arm design: Text arm (only demographic label descriptions) and image arm (FairFace face photos).
  • Four-dimensional testing: Race (6 paired groups), gender, age, utilitarianism (group size).
  • Three role prompts: Randomly assigned to 'default (autonomous driving algorithm)', 'expert (moral philosopher)', or 'ordinary person' roles.

Mirrored Pairing Control

Each scenario generates a base version and a mirrored version, swapping pedestrian positions and reversing action descriptions to eliminate position bias and omission bias. A true preference is considered only when both versions choose the same feature group.

Two-Stage Image Processing

  1. Perception stage: The model identifies the attributes of people in the image and verifies them against FairFace labels; 2. Decision stage: Scenarios with correct perception proceed to moral choice. All API calls use temperature=0 to ensure reproducibility.
4

Section 04

Technical Implementation and Open-Source Value: Modular Design and Transparency

Code Structure

Modular design: scenario_generator.py (scenario generation and API calls), text_arm.py/image_arm.py (experimental arm processing), face_sampler.py (FairFace sampling), report.py (HTML report generation).

Statistical Rigor

Two independent experiments were conducted (SEED=1/2), with each model handling 1000 scenarios per experimental arm per round, totaling 24,000 scenario-level responses to ensure statistical test power.

Open-Source Significance

  • Reproducibility: Facilitates verification and expansion by other researchers.
  • Transparency: Allows the public and regulatory bodies to understand LLM performance in ethical decision-making.
  • Methodological reference: Provides an experimental framework reference for AI ethics research.
5

Section 05

Potential Findings and Implications: Text vs. Image Differences and Cross-Model Comparisons

Text vs. Image Differences

If a model's decisions are inconsistent between text and image conditions, it may mean that visual understanding introduces additional biases, or that text descriptions cannot fully capture associations triggered by visuals.

Impact of Role Settings

Through testing three roles, we can examine whether the model maintains role consistency or adjusts moral reasoning to meet role expectations.

Cross-Model Comparisons

Comparing the performance of the three models can reveal whether different training data and safety alignment strategies lead to systematic value differences, and whether there are neutral models or those with specific preferences.

6

Section 06

Limitations and Ethical Considerations: Methodological Constraints and Research Ethics Challenges

Methodological Limitations

  • Simplified scenarios: Real autonomous driving ethical decisions are more complex than binary choices.
  • Dataset bias: FairFace, though carefully curated, may still have specific demographic distribution characteristics.
  • Laboratory environment: Temperature=0 ensures reproducibility but may not reflect randomness in real-world deployment.

Research Ethics

  • Should AI be allowed to make life-or-death decisions (even in simulations)?
  • Who has the right to decide the 'correct' direction of moral alignment after biases are found?
  • Could publicizing findings be maliciously exploited? The researchers address some of these concerns through open-source practices—transparency is the first step toward trust.
7

Section 07

Implications for AI Alignment Research: Methodological Contributions

This study provides an important direction for the AI safety field: shifting from abstract value alignment discussions to concrete, measurable bias detection. Methodological contributions include:

  1. Multimodal bias testing framework: Systematically comparing model behavior under text and visual inputs.
  2. Mirrored control technology: A reusable experimental template to eliminate position bias and framing effects.
  3. Large-scale comparative study: A demonstration of organizing complex experiments across multiple commercial APIs.
8

Section 08

Conclusion: Ethical Safety is Essential for High-Risk LLM Applications

As LLMs move from chatbots to domains like autonomous driving and medical diagnosis, understanding their moral decision-making patterns is essential for safety. Through rigorous design and open-source practices, this study contributes to exploring key issues. Regardless of the results, it reminds us: technological capability development must keep pace with understanding of value orientations, and more research is needed to illuminate the ethical landscape inside the black box before deploying AI in life-impacting scenarios.