Reading

Learning from Human Labeling Variability: Capturing Personalized Explanatory Behavior via Cross-Annotator Preference Optimization

This paper proposes the Cross-Annotator Preference Optimization (CAPO) method, enabling large language models to learn and replicate the label-explanation behavior patterns of specific annotators. The study shows that Human Labeling Variability (HLV) can serve as a stable signal for training models to understand annotators' personalized reasoning preferences.

人类标注差异跨标注者偏好优化大语言模型个性化解释性标注数据标注偏好优化自然语言推理

Published 2026-05-28 01:55Recent activity 2026-05-28 12:49Estimated read 5 min

Learning from Human Labeling Variability: Capturing Personalized Explanatory Behavior via Cross-Annotator Preference Optimization

Section 01

[Introduction] CAPO Method: Learning Personalized Explanatory Behavior Using Human Labeling Variability

This paper proposes the Cross-Annotator Preference Optimization (CAPO) method, aiming to enable large language models (LLMs) to learn and replicate the label-explanation behavior patterns of specific annotators. The core finding of the study is that Human Labeling Variability (HLV) can serve as a stable signal to help models understand annotators' personalized reasoning preferences.

Section 02

Research Background: Reconsidering Human Labeling Variability (HLV)

Traditional views regard HLV in natural language processing annotation as noise, but recent studies suggest it reflects reasonable perspective/preference differences among annotators. Free-text explanations provide a window to understand HLV, revealing the reasoning processes and preferences behind annotators' label choices. Core question: Can LLMs learn and replicate the specific label-explanation behaviors of annotators?

Section 03

Research Design: Task Selection and Data Collection

The study selected two sentence-pair tasks: Natural Language Inference (NLI, judging logical relationships) and Paraphrase Identification (judging whether sentences have the same meaning). Each task was annotated by 4 different annotators to ensure sufficient data for analyzing individual difference patterns.

Section 04

Core Methods: CAPO vs. Existing Approaches

Three methods are compared:

Prompting method: Directly describe the annotator's style, with limited and unstable performance;
Supervised Fine-Tuning (SFT): Fine-tune using data from specific annotators, with better performance than prompting;
CAPO (Cross-Annotator Preference Optimization): Learn unique patterns by comparing responses from the target annotator with those from others. CAPO technical details: Construct responses from the target annotator as positive examples and others as negative examples; apply preference optimization techniques; balance label consistency and explanation quality.

Section 05

Experimental Evidence: Performance of the CAPO Method

Experimental results:

Prompting method: Baseline performance is limited, and individual pattern capture is unstable;
SFT method: Significantly better than prompting, effectively learning annotator-specific behaviors;
CAPO method: Further improvement over SFT, achieving the best results in multiple dimensions, and generalizing to new inputs (not simple memorization, but transferable style representation).

Section 06

Research Conclusions and Application Prospects

Conclusion: It is proven that HLV can be learned as a stable signal for annotator-specific label-explanation behaviors. Application prospects include:

Personalized model services: Matching specific user/scenario preferences;
Scalable explanatory annotation: Learning explanation styles based on history;
Improvement of annotation quality: Reducing unnecessary disagreements while preserving perspective diversity;
Optimization of human-machine collaboration: Designing better auxiliary annotation systems.

Section 07

Limitations and Future Research Directions

Limitations: Only covers two tasks, limited amount of annotator data, and model representation interpretability needs improvement. Future directions: Expand to more tasks/domains, combine active learning for efficient data collection, and develop better evaluation metrics to measure the quality of annotator modeling.

Continue Reading

Keep going with more reads from the same topic.

Nornir MCP Server: An Enterprise-Grade Bridge for Integrating Large Language Models into Network Automation

Nornir MCP Server is an enterprise-level server based on the Model Context Protocol (MCP). It seamlessly integrates large language models (such as Claude) with the Nornir network automation framework, supporting natural language orchestration for multi-vendor network devices (Cisco, Arista, Juniper, etc.), and providing production-grade features like a dual-engine architecture (NAPALM + Netmiko), intelligent filtering, and a secure sandbox.

Recent activity 2026-05-06 20:51

Bibliothèque Française LLM: A French Public Domain Literature Index System Optimized for Large Language Models

Bibliothèque Française LLM is a structured indexing and annotation project for French public domain literature designed specifically for large language models (LLMs). It integrates multiple authoritative sources such as DraCor, Common Corpus, and Wikisource, providing metadata indexing categorized by genre, author, and era, as well as in-depth annotations for dramatic texts (including characters, lines, stage directions, etc.). Its aim is to enable LLMs to efficiently read and understand classic French literary works.

Recent activity 2026-05-06 20:50

Splinter: A Lock-Free Zero-Copy Shared Memory KV and Vector Storage Library That Eliminates Socket and Memcpy Overhead for LLM Inference

Splinter is a minimalist, high-performance key-value (KV) and vector storage system enabling zero-latency inter-process communication via shared memory and atomic operations. With only 766 lines of core code, it supports millions of operations per second and 768-dimensional vector storage, offering a new architectural approach for local LLM inference and data-intensive applications.

Recent activity 2026-04-03 08:49

Folkering OS: When the Operating System Itself Is AI—A Self-Evolving Bare-Metal Rust System

Folkering OS is the world's first AI-native bare-metal operating system, entirely written in Rust no_std without relying on Linux, POSIX, or libc. It can generate commands from scratch, compile them into WASM, and run them in 10 seconds, achieving true self-evolution.

Recent activity 2026-04-09 16:15