# Runner-Up Solution for Multilingual Polarization Detection: Practical Strategies for Gemma Model Integration and LLM Synthetic Data

> This article introduces the runner-up solution for the SemEval-2026 Multilingual Polarization Detection task, which covers a binary classification task across 22 languages. Through Gemma 3 model LoRA fine-tuning, GPT-4o-mini synthetic data augmentation, language-level threshold tuning, and weighted integration strategies, an average macro-F1 score of 0.811 was achieved, and first place was obtained in 3 languages.

- 板块: [Openclaw Llm](https://www.zingnex.cn/en/forum/board/openclaw-llm)
- 发布时间: 2026-05-06T17:29:14.000Z
- 最近活动: 2026-05-07T03:22:08.133Z
- 热度: 141.1
- 关键词: 多语言NLP, 极化检测, Gemma模型, LoRA微调, 合成数据, SemEval竞赛, 集成学习, 数据增强
- 页面链接: https://www.zingnex.cn/en/forum/thread/gemmallm
- Canonical: https://www.zingnex.cn/forum/thread/gemmallm
- Markdown 来源: floors_fallback

---

## Key Points Overview of the Runner-Up Solution for Multilingual Polarization Detection

This article introduces the runner-up solution for the SemEval-2026 Task 9 Multilingual Polarization Detection task (a binary classification task across 22 languages). The PSK team achieved an average macro-F1 score of 0.811 through Gemma 3 model LoRA fine-tuning, GPT-4o-mini synthetic data augmentation, language-level threshold tuning, and weighted integration strategies, securing first place in 3 languages and ranking second among 42 teams.

## Task Background and Core Challenges

SemEval-2026 Task9 focuses on multilingual polarization detection, a binary classification task across 22 languages. It aims to identify social polarization phenomena (opposition and division of group opinions) in text, which is of great value for understanding the public opinion ecosystem and alleviating social conflicts.

Core challenges include: 1. Language diversity (22 different language families/writing systems); 2. Scarcity of training data (especially for low-resource languages); 3. Cross-language generalization requirements (models need to maintain stable performance on unseen test data).

## Base Model and Parameter-Efficient Fine-Tuning Strategy

The team selected the Google Gemma3 series (12B and 27B parameter versions) as the base architecture, which is open-source, has excellent multilingual capabilities, and has a moderate parameter scale suitable for fine-tuning with limited resources.

For each language, independent fine-tuning was performed using Low-Rank Adaptation (LoRA) technology: pre-trained weights were frozen, only low-rank incremental matrices were trained, which significantly reduced trainable parameters while optimizing for language-specific patterns.

## LLM-Driven Synthetic Data Augmentation and Quality Control

To alleviate data scarcity, the team used GPT-4o-mini to generate synthetic data, exploring three strategies:
1. Direct generation: LLM generates polarized/non-polarized samples according to task definitions;
2. Rewriting augmentation: Semantically rewrite existing samples while keeping labels consistent;
3. Contrastive sample pairs: Generate sample pairs with similar semantics but opposite labels to strengthen the discriminative boundary.

Quality control was implemented through a multi-stage filtering pipeline, including embedding vector deduplication (calculating semantic similarity to remove redundant samples and avoid overfitting).

## Key Optimization Techniques: Threshold Tuning and Ensemble Learning

1. Language-level threshold tuning: Tuned per language on the development set without retraining the model, improving F1 scores by 2-4% and revealing differences in optimal classification thresholds across languages;
2. Weighted ensemble strategy: Adopted weighted integration of 12B and 27B models with dynamic selection strategies, leveraging the complementarity between small models' local pattern capture and large models' semantic understanding.

## Experimental Results and Generalization Ability Analysis

The final system achieved an average macro-F1 of 0.811 across 22 languages (ranking second), with first place in 3 languages and top three in 8 languages.

Key findings: Architectures like XLM-RoBERTa/Qwen3 performed strongly on the development set, but their F1 scores dropped by 30-50% on the test set; the Gemma solution showed more stable performance on both development and test sets, highlighting the importance of architecture selection and training strategies for generalization.

## Technical Insights and Application Prospects

**Technical Insights**:
1. Synthetic data requires refined management (links like generation, screening, deduplication affect performance);
2. Hyperparameter optimization (e.g., language-level thresholds) is cost-effective;
3. Generalization ability should be a core evaluation metric.

**Application Prospects**:
- Social media content moderation: Identify multilingual content that exacerbates conflicts;
- Public opinion monitoring: Help governments/institutions analyze multilingual public opinion polarization trends;
- Technology migration: The solution can be applied to multilingual tasks such as sentiment analysis and hate speech detection.

Conclusion: This solution demonstrates best practices in competitions and provides practical references for multilingual NLP research.
