Reading

HDM-HMM: A New AI Detection Method for Mixed Authorship Based on Sequential Stylometry

HDM-HMM is an innovative AI detection method for mixed-authorship documents. It achieves word-level author inference using a Hierarchical Dirichlet-Multinomial Hidden Markov Model, reducing the error rate by over 40% compared to traditional methods when detecting text co-created by humans and AI.

HDM-HMMAI检测混合作者身份风格计量学隐马尔可夫模型功能词序列标注文本取证学术诚信贝叶斯层次模型

Published 2026-03-30 06:45Recent activity 2026-03-30 06:50Estimated read 6 min

HDM-HMM: A New AI Detection Method for Mixed Authorship Based on Sequential Stylometry

Section 01

HDM-HMM: Introduction to the New AI Detection Method for Mixed Authorship

HDM-HMM is an innovative AI detection method for mixed-authorship documents (co-created by humans and AI). It achieves word-level author inference using a Hierarchical Dirichlet-Multinomial Hidden Markov Model. Treating detection as a sequence labeling problem, this method addresses the failure of traditional binary classification methods in real-world mixed scenarios, reducing the error rate by over 40% compared to traditional methods and providing a new tool for maintaining academic integrity and information authenticity.

Section 02

Practical Challenges in Mixed Authorship Detection

Most existing AI-generated text detection methods are based on the binary classification assumption of 'completely human or completely AI', which works in labs but faces challenges in reality. A large number of documents have mixed authorship (humans write part of the content + AI generates/modifies/continues writing), such as book reviews where the opening is human thoughts plus AI summary, or reports where humans build the framework plus AI fills in details. Traditional overall detection cannot locate AI segments.

Section 03

Technical Framework and Core Innovations of HDM-HMM

HDM-HMM treats detection as a sequence labeling problem (each word is labeled as human/AI), using the Hidden Markov Model (HMM) as the basic framework and introducing Hierarchical Dirichlet-Multinomial modeling to solve the data sparsity problem. It uses function words (200 categories including articles, conjunctions, etc.) as features, balancing stability and interpretability; it achieves word-level inference and boundary detection through the Viterbi algorithm, capturing writing style switching patterns.

Section 04

Experimental Design and Comparative Results

The experiment constructed a mixed-authorship dataset (Amazon book review human segments + GPT continuation), setting three scenarios: balanced mixing, short AI segments, and AI-dominated. The comparison baselines include Multinomial HMM, rolling stylometry methods, GPT-2 perplexity, etc. Results show that HDM-HMM has the lowest error rate: 4.4% for balanced mixing, 5.1% for short AI segments, and 3.2% for AI-dominated, which is about 40% lower than Multinomial HMM and over 60% lower than the best rolling method.

Section 05

Analysis of Advantages and Limitations of HDM-HMM

Advantages: Sequence modeling uses context to improve boundary judgment accuracy; Hierarchical Dirichlet prior regularization enhances robustness; Function word features provide interpretability. Limitations: Dependence on fixed function word lists may not apply to some languages/domains; Only supports human/AI binary categories; High computational cost for long documents.

Section 06

Practical Application Scenarios of HDM-HMM

In the field of academic integrity, it can carefully assess the degree of AI assistance in students' homework; In the news and publishing field, it can detect undeclared AI content in submissions; In legal forensics, it can assist in document authorship analysis; In AI security research, it can promote the progress of attack and defense technologies.

Section 07

Research Significance and Prospects of HDM-HMM

HDM-HMM achieves the transformation from document classification to word-level sequence labeling, from black-box models to interpretable probabilistic models, and from single-author to mixed-authorship assumptions. It not only improves detection accuracy but also deepens the understanding of human-AI collaborative writing, providing a tool to answer the question of 'the respective contributions of humans and AI'.

Continue Reading

Keep going with more reads from the same topic.

Nornir MCP Server: An Enterprise-Grade Bridge for Integrating Large Language Models into Network Automation

Nornir MCP Server is an enterprise-level server based on the Model Context Protocol (MCP). It seamlessly integrates large language models (such as Claude) with the Nornir network automation framework, supporting natural language orchestration for multi-vendor network devices (Cisco, Arista, Juniper, etc.), and providing production-grade features like a dual-engine architecture (NAPALM + Netmiko), intelligent filtering, and a secure sandbox.

Recent activity 2026-05-06 20:51

Bibliothèque Française LLM: A French Public Domain Literature Index System Optimized for Large Language Models

Bibliothèque Française LLM is a structured indexing and annotation project for French public domain literature designed specifically for large language models (LLMs). It integrates multiple authoritative sources such as DraCor, Common Corpus, and Wikisource, providing metadata indexing categorized by genre, author, and era, as well as in-depth annotations for dramatic texts (including characters, lines, stage directions, etc.). Its aim is to enable LLMs to efficiently read and understand classic French literary works.

Recent activity 2026-05-06 20:50

Splinter: A Lock-Free Zero-Copy Shared Memory KV and Vector Storage Library That Eliminates Socket and Memcpy Overhead for LLM Inference

Splinter is a minimalist, high-performance key-value (KV) and vector storage system enabling zero-latency inter-process communication via shared memory and atomic operations. With only 766 lines of core code, it supports millions of operations per second and 768-dimensional vector storage, offering a new architectural approach for local LLM inference and data-intensive applications.

Recent activity 2026-04-03 08:49

Folkering OS: When the Operating System Itself Is AI—A Self-Evolving Bare-Metal Rust System

Folkering OS is the world's first AI-native bare-metal operating system, entirely written in Rust no_std without relying on Linux, POSIX, or libc. It can generate commands from scratch, compile them into WASM, and run them in 10 seconds, achieving true self-evolution.

Recent activity 2026-04-09 16:15