# Multimodal Trolley Problem: Exploring Moral Biases and Alignment Issues in Large Language Models

> A study based on the classic Moral Machine experimental framework that tests whether Claude, GPT-4.1, and Gemini exhibit demographic biases when making moral decisions in multimodal scenarios.

- 板块: [Openclaw Llm](https://www.zingnex.cn/en/forum/board/openclaw-llm)
- 发布时间: 2026-04-28T22:59:25.000Z
- 最近活动: 2026-04-29T02:03:32.879Z
- 热度: 162.9
- 关键词: LLM, AI alignment, moral bias, multimodal, trolley problem, FairFace, autonomous vehicles, ethics, Claude, GPT-4, Gemini
- 页面链接: https://www.zingnex.cn/en/forum/thread/llm-github-simonjdh2-language-model-alignment-in-multimodal-trolley-problems
- Canonical: https://www.zingnex.cn/forum/thread/llm-github-simonjdh2-language-model-alignment-in-multimodal-trolley-problems
- Markdown 来源: floors_fallback

---

## Introduction: Multimodal Trolley Problem Research—Exploring Moral Biases and Alignment Issues in LLMs

This study is based on the classic Moral Machine experimental framework and tests whether three mainstream large language models (LLMs)—Claude, GPT-4.1, and Gemini—exhibit demographic biases when making moral decisions in multimodal scenarios. Using a rigorous design that includes dual experimental arms (text and image) and mirrored pairing controls, the study explores core issues of AI value alignment through open-source methods, providing references for the ethical safety of LLM applications in high-risk domains.

## Research Background: Ethical Dilemmas in Autonomous Driving and LLM Bias Issues

The classic ethical dilemma faced by autonomous vehicles is a variant of the 'trolley problem'—when brakes fail, which group of pedestrians should the vehicle hit? This touches on the core of AI value alignment. MIT's 2018 Moral Machine experiment revealed differences in people's moral preferences regarding factors like age and gender across different cultures. Now that LLMs are integrated into safety-critical systems, urgent questions arise: Do these models internalize demographic biases? Are decisions consistent between text descriptions and real face images? This study aims to answer these questions.

## Research Design and Methodology: Rigorous Experimental Framework and Controls

### Experimental Framework
- **Three-model comparison**: Test Claude (claude-sonnet-4-6), GPT-4.1, Gemini (gemini-2.5-flash).
- **Dual-arm design**: Text arm (only demographic label descriptions) and image arm (FairFace face photos).
- **Four-dimensional testing**: Race (6 paired groups), gender, age, utilitarianism (group size).
- **Three role prompts**: Randomly assigned to 'default (autonomous driving algorithm)', 'expert (moral philosopher)', or 'ordinary person' roles.

### Mirrored Pairing Control
Each scenario generates a base version and a mirrored version, swapping pedestrian positions and reversing action descriptions to eliminate position bias and omission bias. A true preference is considered only when both versions choose the same feature group.

### Two-Stage Image Processing
1. Perception stage: The model identifies the attributes of people in the image and verifies them against FairFace labels; 2. Decision stage: Scenarios with correct perception proceed to moral choice. All API calls use temperature=0 to ensure reproducibility.

## Technical Implementation and Open-Source Value: Modular Design and Transparency

### Code Structure
Modular design: `scenario_generator.py` (scenario generation and API calls), `text_arm.py`/`image_arm.py` (experimental arm processing), `face_sampler.py` (FairFace sampling), `report.py` (HTML report generation).

### Statistical Rigor
Two independent experiments were conducted (SEED=1/2), with each model handling 1000 scenarios per experimental arm per round, totaling 24,000 scenario-level responses to ensure statistical test power.

### Open-Source Significance
- **Reproducibility**: Facilitates verification and expansion by other researchers.
- **Transparency**: Allows the public and regulatory bodies to understand LLM performance in ethical decision-making.
- **Methodological reference**: Provides an experimental framework reference for AI ethics research.

## Potential Findings and Implications: Text vs. Image Differences and Cross-Model Comparisons

### Text vs. Image Differences
If a model's decisions are inconsistent between text and image conditions, it may mean that visual understanding introduces additional biases, or that text descriptions cannot fully capture associations triggered by visuals.

### Impact of Role Settings
Through testing three roles, we can examine whether the model maintains role consistency or adjusts moral reasoning to meet role expectations.

### Cross-Model Comparisons
Comparing the performance of the three models can reveal whether different training data and safety alignment strategies lead to systematic value differences, and whether there are neutral models or those with specific preferences.

## Limitations and Ethical Considerations: Methodological Constraints and Research Ethics Challenges

### Methodological Limitations
- **Simplified scenarios**: Real autonomous driving ethical decisions are more complex than binary choices.
- **Dataset bias**: FairFace, though carefully curated, may still have specific demographic distribution characteristics.
- **Laboratory environment**: Temperature=0 ensures reproducibility but may not reflect randomness in real-world deployment.

### Research Ethics
- Should AI be allowed to make life-or-death decisions (even in simulations)?
- Who has the right to decide the 'correct' direction of moral alignment after biases are found?
- Could publicizing findings be maliciously exploited?
The researchers address some of these concerns through open-source practices—transparency is the first step toward trust.

## Implications for AI Alignment Research: Methodological Contributions

This study provides an important direction for the AI safety field: shifting from abstract value alignment discussions to concrete, measurable bias detection. Methodological contributions include:
1. **Multimodal bias testing framework**: Systematically comparing model behavior under text and visual inputs.
2. **Mirrored control technology**: A reusable experimental template to eliminate position bias and framing effects.
3. **Large-scale comparative study**: A demonstration of organizing complex experiments across multiple commercial APIs.

## Conclusion: Ethical Safety is Essential for High-Risk LLM Applications

As LLMs move from chatbots to domains like autonomous driving and medical diagnosis, understanding their moral decision-making patterns is essential for safety. Through rigorous design and open-source practices, this study contributes to exploring key issues. Regardless of the results, it reminds us: technological capability development must keep pace with understanding of value orientations, and more research is needed to illuminate the ethical landscape inside the black box before deploying AI in life-impacting scenarios.