Reading

Dynamic Alignment: Rethinking Human-AI Value Alignment Evaluation Through Longitudinal Research

This paper proposes shifting from single-moment preference collection to a longitudinal, contextualized alignment measurement approach. Through the BITE browser system, the study found significant differences between users' immediate preferences and subsequent reflections, revealing the limitations of traditional alignment evaluation methods.

人机对齐纵向研究偏好评估RLHF隐私保护AI安全用户体验价值对齐

Published 2026-05-06 01:51Recent activity 2026-05-06 11:20Estimated read 7 min

Dynamic Alignment: Rethinking Human-AI Value Alignment Evaluation Through Longitudinal Research

Section 01

【Introduction】Dynamic Alignment: Rethinking the Temporal Dimension of Human-AI Value Alignment

Core argument of this paper: Current human-AI alignment evaluation methods assume static user preferences and rely on immediate feedback (e.g., RLHF/DPO), but in reality, user preferences change over time and context. The study proposes a longitudinal, contextualized alignment measurement framework and validates it through the BITE browser system, finding significant differences between users' immediate preferences and subsequent reflections, thus revealing the limitations of traditional methods.

Section 02

【Background】Flaws in the Static Assumptions of Current Human-AI Alignment Evaluation

The Neglected Temporal Dimension

Current LLM alignment research (e.g., RLHF, DPO) assumes static user preferences and relies on immediate feedback at the end of interactions. However, real-world decisions are time-extended: the consequences of AI-assisted decisions lead users to re-evaluate later.

Limitations of Immediate Feedback

Time-extended nature of decisions: The actual consequences of AI outputs (e.g., email replies, travel experiences) affect final evaluations, but existing datasets ignore this dimension.
Cognitive limitations: Immediate judgments are influenced by cognitive biases (availability heuristic, anchoring effect), while evaluations after careful consideration are more rational.

Section 03

【Methodology】Longitudinal Alignment Measurement Framework and BITE System Implementation

Three-Pronged Framework

In-context preference capture: Collect immediate feedback as the starting point for tracking.
Context-triggered subsequent reflection: Trigger re-evaluation at key decision points (e.g., receiving an email reply, completing a purchase).
Privacy-preserving behavior trajectory: Collect de-identified behavior data to explain preference changes, with a user-led consent mechanism.

BITE System Implementation

Key interaction detection: Identify impactful LLM interactions (decision-making, planning, etc.).
Progressive consent: Request permissions in stages; users can manage data at any time.
Contextualized reflection triggering: Prompt reflection at relevant moments (returning to view outputs, new operations linked to old decisions).

Section 04

【Evidence】Key Findings from a Two-Week Longitudinal Study

Differences Between Immediate and Subsequent Preferences

A two-week study with 8 participants showed:

Accuracy: Some answers marked as 'accurate' immediately were later rated as 'partially accurate' or 'misleadingly accurate' (missing key information).
Relevance: Many outputs initially considered 'relevant and useful' were found to not solve the problem after actual application.

Patterns of Preference Change

Satisfaction → Disappointment: Good on the surface but with practical limitations;
Skepticism → Recognition: Found effective after verification;
Context-dependent shift: The same output is evaluated differently in different contexts.

Implications for Existing Datasets

Datasets based on immediate feedback may have systematic biases, leading to overestimation of model performance and misjudgment of alignment levels.

Section 05

【Significance and Limitations】Value and Constraints of the Longitudinal Alignment Approach

Significance

AI Safety: Detect long-term issues earlier (reward hacking, value drift);
User Experience: Products need to provide re-evaluation mechanisms to avoid solidifying user profiles.

Limitations

Small sample size (8 participants);
Short time span (two weeks);
Self-selection bias (participants have high technical acceptance);
Scenario limitation (browser environment only).

Section 06

【Future Directions】Next Steps in Dynamic Alignment Research

Future Exploration Directions

Large-scale longitudinal datasets: Build datasets spanning months/years with thousands of users;
Dynamic alignment training: Develop algorithms that use longitudinal signals (online learning, continuous adaptation);
Cross-cultural research: Explore preference changes across different cultures;
Automated detection: Automatically identify preference changes without explicit queries.

Conclusion

Human-AI alignment needs to embrace the temporal dimension; static preference signals cannot reflect real needs. The longitudinal perspective is key to building truly aligned AI systems, especially when AI is involved in important decisions.