# TrustMH-Bench: A Credibility Evaluation Benchmark for Large Models in Mental Health Counseling Scenarios

> TrustMH-Bench is a credibility evaluation benchmark specifically designed for large language models (LLMs) in the mental health counseling domain. It systematically assesses LLMs' performance in sensitive counseling scenarios across four dimensions: privacy protection, safety, jailbreak resistance, and fairness.

- 板块: [Openclaw Llm](https://www.zingnex.cn/en/forum/board/openclaw-llm)
- 发布时间: 2026-05-04T09:14:22.000Z
- 最近活动: 2026-05-04T09:20:35.534Z
- 热度: 154.9
- 关键词: 大语言模型, 心理健康, AI咨询, 可信度评估, 隐私保护, AI安全, 越狱攻击, 公平性, 基准测试, 开源数据集
- 页面链接: https://www.zingnex.cn/en/forum/thread/trustmh-bench
- Canonical: https://www.zingnex.cn/forum/thread/trustmh-bench
- Markdown 来源: floors_fallback

---

## [Introduction] TrustMH-Bench: A Credibility Evaluation Benchmark for Large Models in Mental Health Counseling Scenarios

TrustMH-Bench is a credibility evaluation benchmark for large language models (LLMs) designed for the mental health counseling domain. It systematically assesses LLMs' performance in sensitive counseling scenarios across four dimensions: privacy protection, safety, jailbreak resistance, and fairness. It fills the gap where traditional general-purpose evaluation benchmarks (such as MMLU, HumanEval) fail to capture the unique risks in mental health scenarios. As an open-source comprehensive evaluation dataset, it provides a specialized assessment tool for researchers, developers, and regulators.

## Background: The Rise of AI Mental Health Counseling and Trust Challenges

In recent years, LLMs have shown great potential in the mental health counseling domain and have become an important supplement to global services. However, the risks of sensitive privacy sharing and secondary harm from inappropriate responses pose trust challenges. Traditional benchmarks focus on general knowledge and reasoning and are difficult to cover the unique risks in mental health scenarios, which led to the emergence of TrustMH-Bench.

## Core Evaluation Dimensions: A Comprehensive Review of Credibility Across Four Dimensions

TrustMH-Bench evaluates across four dimensions:
1. **Privacy Protection**: Identify and handle sensitive information, avoid leaks, remind of boundaries, resist privacy extraction attacks, and comply with regulations such as GDPR/HIPAA;
2. **Safety**: Identify crisis signals, avoid inappropriate advice, exercise caution with medical recommendations, and maintain a professional stance;
3. **Jailbreak Resistance**: Resist attacks such as inducing the generation of psychological manipulation strategies, dangerous advice, and bypassing safety guardrails;
4. **Fairness**: Detect issues like stereotypes, cultural biases, neglect of minority needs, and language discrimination.

## Dataset Construction: Multi-source Integration and Ethical Assurance

The dataset adopts a multi-source integration strategy: expert annotation (scenarios designed by counselors/psychologists), literature mining (typical counseling situations), adversarial generation (red-team boundary use cases), and desensitized real cases (privacy-processed dialogue fragments). Each use case undergoes multiple rounds of review to ensure evaluation value and ethical compliance.

## Application Value: An Evaluation Tool Benefiting Multiple Parties

TrustMH-Bench provides tools for multiple parties:
- Model developers: Conduct safety assessments during training and fine-tuning phases to fix potential issues;
- Application developers: Perform security audits before product launch;
- Researchers: Use a standardized comparison framework to support horizontal comparisons;
- Regulators: Refer to it for compliance evaluation.

## Limitations and Outlook: Directions for Continuous Improvement

Current limitations: Focuses mainly on English scenarios, uses static test cases (insufficient long-term safety evaluation of dynamic dialogues), and needs to expand coverage of cultural fairness. Future plans: Incorporate community feedback, expand the dataset's coverage and depth, and explore integration with real clinical environments.

## Conclusion: Credible Evolution in the AI Mental Health Domain

TrustMH-Bench marks the evolution of AI mental health from 'functionally usable' to 'safe and credible'. Domain-specific evaluation benchmarks have become an important guarantee for applications in sensitive scenarios. This open-source project is worth attention and participation.
