Section 01
[Introduction] EST-Bench: An LLM Safety Evaluation Benchmark Focused on Extreme Survival Scenarios
EST-Bench is an open-source deterministic evaluation framework specifically designed to test the safety, policy compliance, and tactical reasoning capabilities of large language models (LLMs) in extreme survival scenarios such as harsh conditions, power outages, and resource scarcity. It fills the gap in traditional safety evaluations regarding the assessment of decision-making capabilities in extreme environments, providing researchers and developers with standardized tools.