Section 01
Axiom Framework: An Open-Source Tool for Systematically Evaluating LLM Confidence Calibration Capability
Axiom is an open-source evaluation framework aimed at systematically measuring the confidence calibration performance of open-source large language models (LLMs) across multiple task types, including reasoning, common sense judgment, binary decision-making, and factual accuracy, helping developers identify models' overconfidence issues. The framework supports a variety of mainstream open-source models and provides dual-run modes (Kaggle and local). Its evaluation results have important guiding significance for model selection, fine-tuning, and product design.