Zing Forum

Reading

HDM-HMM: A New AI Detection Method for Mixed Authorship Based on Sequential Stylometry

HDM-HMM is an innovative AI detection method for mixed-authorship documents. It achieves word-level author inference using a Hierarchical Dirichlet-Multinomial Hidden Markov Model, reducing the error rate by over 40% compared to traditional methods when detecting text co-created by humans and AI.

HDM-HMMAI检测混合作者身份风格计量学隐马尔可夫模型功能词序列标注文本取证学术诚信贝叶斯层次模型
Published 2026-03-30 06:45Recent activity 2026-03-30 06:50Estimated read 6 min
HDM-HMM: A New AI Detection Method for Mixed Authorship Based on Sequential Stylometry
1

Section 01

HDM-HMM: Introduction to the New AI Detection Method for Mixed Authorship

HDM-HMM is an innovative AI detection method for mixed-authorship documents (co-created by humans and AI). It achieves word-level author inference using a Hierarchical Dirichlet-Multinomial Hidden Markov Model. Treating detection as a sequence labeling problem, this method addresses the failure of traditional binary classification methods in real-world mixed scenarios, reducing the error rate by over 40% compared to traditional methods and providing a new tool for maintaining academic integrity and information authenticity.

2

Section 02

Practical Challenges in Mixed Authorship Detection

Most existing AI-generated text detection methods are based on the binary classification assumption of 'completely human or completely AI', which works in labs but faces challenges in reality. A large number of documents have mixed authorship (humans write part of the content + AI generates/modifies/continues writing), such as book reviews where the opening is human thoughts plus AI summary, or reports where humans build the framework plus AI fills in details. Traditional overall detection cannot locate AI segments.

3

Section 03

Technical Framework and Core Innovations of HDM-HMM

HDM-HMM treats detection as a sequence labeling problem (each word is labeled as human/AI), using the Hidden Markov Model (HMM) as the basic framework and introducing Hierarchical Dirichlet-Multinomial modeling to solve the data sparsity problem. It uses function words (200 categories including articles, conjunctions, etc.) as features, balancing stability and interpretability; it achieves word-level inference and boundary detection through the Viterbi algorithm, capturing writing style switching patterns.

4

Section 04

Experimental Design and Comparative Results

The experiment constructed a mixed-authorship dataset (Amazon book review human segments + GPT continuation), setting three scenarios: balanced mixing, short AI segments, and AI-dominated. The comparison baselines include Multinomial HMM, rolling stylometry methods, GPT-2 perplexity, etc. Results show that HDM-HMM has the lowest error rate: 4.4% for balanced mixing, 5.1% for short AI segments, and 3.2% for AI-dominated, which is about 40% lower than Multinomial HMM and over 60% lower than the best rolling method.

5

Section 05

Analysis of Advantages and Limitations of HDM-HMM

Advantages: Sequence modeling uses context to improve boundary judgment accuracy; Hierarchical Dirichlet prior regularization enhances robustness; Function word features provide interpretability. Limitations: Dependence on fixed function word lists may not apply to some languages/domains; Only supports human/AI binary categories; High computational cost for long documents.

6

Section 06

Practical Application Scenarios of HDM-HMM

In the field of academic integrity, it can carefully assess the degree of AI assistance in students' homework; In the news and publishing field, it can detect undeclared AI content in submissions; In legal forensics, it can assist in document authorship analysis; In AI security research, it can promote the progress of attack and defense technologies.

7

Section 07

Research Significance and Prospects of HDM-HMM

HDM-HMM achieves the transformation from document classification to word-level sequence labeling, from black-box models to interpretable probabilistic models, and from single-author to mixed-authorship assumptions. It not only improves detection accuracy but also deepens the understanding of human-AI collaborative writing, providing a tool to answer the question of 'the respective contributions of humans and AI'.