Zing Forum

Reading

JiraiBench: A Bilingual Large Model Evaluation Benchmark for Self-Harm Behavior Detection in Jirai Subculture Communities

JiraiBench is the first bilingual evaluation benchmark specifically designed for detecting self-harm content in Jirai subculture communities, providing a standardized test set to assess the ability of large language models to identify potential mental health risk content.

大语言模型自伤行为检测地雷系心理健康内容审核双语评测亚文化AI伦理
Published 2026-04-13 12:14Recent activity 2026-04-13 12:20Estimated read 7 min
JiraiBench: A Bilingual Large Model Evaluation Benchmark for Self-Harm Behavior Detection in Jirai Subculture Communities
1

Section 01

Introduction: JiraiBench—the First Bilingual Evaluation Benchmark for Self-Harm Behavior Detection in Jirai Subculture Communities

JiraiBench is the first bilingual (Chinese and Japanese) evaluation benchmark specifically for detecting self-harm content in Jirai subculture communities. It aims to provide a standardized test set to assess the ability of large language models to identify potential mental health risk content, filling the gap in the lack of systematic evaluation standards for traditional moderation systems and existing large models in this field.

2

Section 02

Background and Motivation: Content Moderation Challenges Brought by Jirai Subculture

In recent years, the "Jirai" subculture originating from Japan has spread rapidly among young people in East Asia. Its dark and decadent aesthetic is often accompanied by expressions of self-harm and depression themes. With the expansion of related communities, identifying potential self-harm content has become an important issue for mental health intervention and platform governance. Traditional moderation systems struggle to accurately identify such implicit and contextual expressions, and there is a lack of systematic evaluation standards for the detection ability of large models in the face of its unique language style and cultural background—thus the JiraiBench project was born.

3

Section 03

Project Overview: Core Positioning and Goals of JiraiBench

JiraiBench is a bilingual (Chinese-Japanese) evaluation benchmark dataset, collected from real social media and professionally annotated, covering various expression forms under Jirai culture (implicit hints, direct statements, subcultural terms, etc.). Its core goal is to establish a standardized testing framework to help researchers and developers understand the performance of large models in handling sensitive content, identify blind spots, and promote the development of precise and culturally sensitive content detection technologies.

4

Section 04

Dataset Features: Bilingual, Real-Scene, and Culturally Sensitive Annotation Design

JiraiBench dataset features include:

  1. Bilingual Coverage: Includes Chinese and Japanese samples, reflecting cross-language transmission characteristics and testing cross-language transfer effects;
  2. Real-Scene Data: Collected from real social platforms, retaining original language styles, internet slang, and subcultural expressions;
  3. Fine-Grained Annotation: Annotates dimensions such as whether the content contains self-harm behavior, its severity, and the directness of expression;
  4. Culturally Contextual Sensitivity: Distinguishes between pure stylistic expressions and real risk signals to avoid misjudgment from keyword matching.
5

Section 05

Evaluation Methodology: Multi-Dimensional Assessment of Model Capabilities

JiraiBench adopts a multi-dimensional evaluation framework, focusing on:

  1. Balance Between Recall and Precision: Weigh the consequences of missed detections (false negatives) and false alarms (false positives);
  2. Cross-Language Consistency: Evaluate the consistency of model performance on Chinese and Japanese samples;
  3. Implicit Expression Recognition: Test the model's understanding of metaphorical and symbolic self-harm content;
  4. Cultural Adaptability: Examine the degree of understanding of Jirai-specific terms, symbols, and cultural backgrounds.
6

Section 06

Application Value: Multiple Significance for Academia, Industry, and Social Welfare

The release of JiraiBench has multiple meanings:

  • Academic Research: Provides a standardized tool for interdisciplinary research between mental health and NLP, promoting reproducible research;
  • Industry: Serves as a test set for content safety systems, helping platforms optimize Jirai content moderation strategies;
  • Model Developers: Offers a capability diagnosis tool to guide model optimization;
  • Social Welfare: Improves the accuracy of risk content identification, providing earlier intervention opportunities for young people in psychological distress.
7

Section 07

Limitations and Future Directions: Paths for Continuous Optimization

Limitations of JiraiBench: It mainly covers Chinese and Japanese contexts, and the applicability to other languages needs to be verified; the evolution of Jirai culture requires attention to the timeliness of the dataset. Future directions: Expand language coverage, establish a dynamic update mechanism, develop fine-grained risk assessment models, and explore human-machine collaborative moderation models.