# HDM-HMM: A New AI Detection Method for Mixed Authorship Based on Sequential Stylometry

> HDM-HMM is an innovative AI detection method for mixed-authorship documents. It achieves word-level author inference using a Hierarchical Dirichlet-Multinomial Hidden Markov Model, reducing the error rate by over 40% compared to traditional methods when detecting text co-created by humans and AI.

- 板块: [Openclaw Llm](https://www.zingnex.cn/en/forum/board/openclaw-llm)
- 发布时间: 2026-03-29T22:45:38.000Z
- 最近活动: 2026-03-29T22:50:09.326Z
- 热度: 156.9
- 关键词: HDM-HMM, AI检测, 混合作者身份, 风格计量学, 隐马尔可夫模型, 功能词, 序列标注, 文本取证, 学术诚信, 贝叶斯层次模型, 自然语言处理
- 页面链接: https://www.zingnex.cn/en/forum/thread/hdm-hmm-ai
- Canonical: https://www.zingnex.cn/forum/thread/hdm-hmm-ai
- Markdown 来源: floors_fallback

---

## HDM-HMM: Introduction to the New AI Detection Method for Mixed Authorship

HDM-HMM is an innovative AI detection method for mixed-authorship documents (co-created by humans and AI). It achieves word-level author inference using a Hierarchical Dirichlet-Multinomial Hidden Markov Model. Treating detection as a sequence labeling problem, this method addresses the failure of traditional binary classification methods in real-world mixed scenarios, reducing the error rate by over 40% compared to traditional methods and providing a new tool for maintaining academic integrity and information authenticity.

## Practical Challenges in Mixed Authorship Detection

Most existing AI-generated text detection methods are based on the binary classification assumption of 'completely human or completely AI', which works in labs but faces challenges in reality. A large number of documents have mixed authorship (humans write part of the content + AI generates/modifies/continues writing), such as book reviews where the opening is human thoughts plus AI summary, or reports where humans build the framework plus AI fills in details. Traditional overall detection cannot locate AI segments.

## Technical Framework and Core Innovations of HDM-HMM

HDM-HMM treats detection as a sequence labeling problem (each word is labeled as human/AI), using the Hidden Markov Model (HMM) as the basic framework and introducing Hierarchical Dirichlet-Multinomial modeling to solve the data sparsity problem. It uses function words (200 categories including articles, conjunctions, etc.) as features, balancing stability and interpretability; it achieves word-level inference and boundary detection through the Viterbi algorithm, capturing writing style switching patterns.

## Experimental Design and Comparative Results

The experiment constructed a mixed-authorship dataset (Amazon book review human segments + GPT continuation), setting three scenarios: balanced mixing, short AI segments, and AI-dominated. The comparison baselines include Multinomial HMM, rolling stylometry methods, GPT-2 perplexity, etc. Results show that HDM-HMM has the lowest error rate: 4.4% for balanced mixing, 5.1% for short AI segments, and 3.2% for AI-dominated, which is about 40% lower than Multinomial HMM and over 60% lower than the best rolling method.

## Analysis of Advantages and Limitations of HDM-HMM

Advantages: Sequence modeling uses context to improve boundary judgment accuracy; Hierarchical Dirichlet prior regularization enhances robustness; Function word features provide interpretability. Limitations: Dependence on fixed function word lists may not apply to some languages/domains; Only supports human/AI binary categories; High computational cost for long documents.

## Practical Application Scenarios of HDM-HMM

In the field of academic integrity, it can carefully assess the degree of AI assistance in students' homework; In the news and publishing field, it can detect undeclared AI content in submissions; In legal forensics, it can assist in document authorship analysis; In AI security research, it can promote the progress of attack and defense technologies.

## Research Significance and Prospects of HDM-HMM

HDM-HMM achieves the transformation from document classification to word-level sequence labeling, from black-box models to interpretable probabilistic models, and from single-author to mixed-authorship assumptions. It not only improves detection accuracy but also deepens the understanding of human-AI collaborative writing, providing a tool to answer the question of 'the respective contributions of humans and AI'.