Section 01
LexBench: Introduction to the LLM Evaluation System for Multilingual Environmental Law
LexBench is an LLM evaluation system designed for multilingual environmental law tasks, covering four key competency dimensions: information extraction, legal reasoning, numerical analysis, and hallucination detection. It builds its dataset based on real multilingual legal documents from three jurisdictions—Saudi Arabia, China, and Finland—and evaluates mainstream commercial LLMs such as GPT-4o and Claude. The evaluation finds that deep reasoning remains a shortcoming of models, and there are significant performance differences between models. The project is open-source, providing a standardized evaluation tool for the legal AI community.