Zing Forum

Reading

LinguaJailbreak-Lab: A Crowdsourced Discovery and Analysis Framework for Cross-Lingual Jailbreak Attacks

An open-source research tool based on the CC-BOS method, which uses crowdsourced intelligence to guide the discovery and evaluation of cross-lingual security vulnerabilities in large language models

大语言模型越狱攻击跨语言安全CC-BOS古典中文AI安全红队测试GPT-4o
Published 2026-05-25 15:44Recent activity 2026-05-25 15:50Estimated read 4 min
LinguaJailbreak-Lab: A Crowdsourced Discovery and Analysis Framework for Cross-Lingual Jailbreak Attacks
1

Section 01

Introduction / Main Floor: LinguaJailbreak-Lab: A Crowdsourced Discovery and Analysis Framework for Cross-Lingual Jailbreak Attacks

An open-source research tool based on the CC-BOS method, which uses crowdsourced intelligence to guide the discovery and evaluation of cross-lingual security vulnerabilities in large language models

2

Section 02

Original Authors and Source


3

Section 03

Research Background: Cross-Lingual Security Challenges of Large Language Models

With the global deployment of large language models (LLMs), an often-overlooked security dimension has emerged: cross-lingual attacks. Attackers may exploit the multilingual capabilities of models to bypass safety alignment mechanisms using low-resource languages or classical languages. The LinguaJailbreak-Lab project addresses this challenge by providing a crowdsourced-guided framework for cross-lingual jailbreak attack discovery and analysis

4

Section 04

CC-BOS Method: Classical Chinese-Guided Jailbreak Attacks

The core of the project is based on the CC-BOS (Classical Chinese-Based Optimization Strategy) method, an optimization strategy that uses classical Chinese as an attack medium. Research shows that classical Chinese, as a semantically rich language with insufficient coverage in modern LLM safety training data, can be an effective attack vector

5

Section 05

Experimental Configuration

The project provides complete experimental reproduction configurations:

  • Attack Method: CC-BOS
  • Attack Language: Classical Chinese
  • Target Model: GPT-4o
  • Prompt Generation Model: DeepSeek-Chat
  • Translation Model: DeepSeek-Chat
  • Evaluation Model: GPT-4o
  • Population Size: 5
  • Maximum Iterations: 5
  • Success Criterion: released-code score >= 80
  • Early Stop Threshold: score >= 120
6

Section 06

Dual-Mode Operation Architecture

The project is designed with two operation modes to meet different research needs:

7

Section 07

Qwen-Only Mode (Default)

This mode uses Qwen-Plus to unify the entire process of prompt generation, target response, translation, and evaluation. This design simplifies API management, allowing researchers to verify the complete CC-BOS process with just one API key configuration

8

Section 08

Strict GPT-4o Reproduction Mode

This mode strictly follows the implementation of the original CC-BOS paper, using multiple different models for collaborative work, and is suitable for research that requires direct comparison with the paper's results