Section 01
[Introduction] Identifiable Victim Effect in Large Language Models: How Narrative Trumps Numbers
This article explores the 'Identifiable Victim Effect' (IVE) in large language models (LLMs) when making decisions involving human lives—i.e., the tendency to prioritize saving specific, identifiable individuals over a larger number of abstract groups. The study found that models trained with RLHF alignment exhibit more pronounced biases, and stronger reasoning capabilities may instead 'rationalize' emotion-driven choices. The article also analyzes the underlying mechanisms, practical application impacts, and mitigation strategies, reminding us of the need to strike a balance between AI 'humanization' and rational fairness.