Zing Forum

Reading

Large Language Models Revolutionize Medical ICD Auto-Coding: From PLM-ICD to Next-Generation Intelligent Coding Systems

This article delves into a cutting-edge research project that explores how to use state-of-the-art medical large language models (LLMs) to significantly improve the accuracy, interpretability, and effectiveness of automatic ICD code assignment from unstructured clinical records, and conducts a comprehensive comparative analysis with the existing baseline method PLM-ICD.

大型语言模型ICD编码医疗AIPLM-ICD临床文本处理多标签分类医疗信息化自然语言处理
Published 2026-05-04 10:15Recent activity 2026-05-04 10:19Estimated read 7 min
Large Language Models Revolutionize Medical ICD Auto-Coding: From PLM-ICD to Next-Generation Intelligent Coding Systems
1

Section 01

[Introduction] Overview of Research on Large Language Models Revolutionizing Medical ICD Auto-Coding

This study focuses on using state-of-the-art medical large language models (LLMs) to improve the accuracy, interpretability, and effectiveness of ICD auto-coding in unstructured clinical records, and conducts a multi-dimensional comparative analysis with the existing baseline method PLM-ICD. The study will evaluate model performance from three core dimensions: accuracy (micro-F1, macro-F1, AUPRC), interpretability (attention mechanism, generative explanation), and practical application effects (inference speed, resource consumption, etc.), aiming to provide new technical directions for the automation of medical ICD coding.

2

Section 02

Research Background: Urgent Need for Automation of Medical ICD Coding

In the modern healthcare system, ICD coding is a key link connecting clinical diagnosis and treatment with medical management. However, traditional manual coding is costly and inefficient, making it difficult to handle massive electronic medical record data. Pre-trained language models (PLMs) such as PLM-ICD have brought hope for automated coding, but with the development of large language models, whether they can achieve a qualitative leap has become a research focus. PLM-ICD uses models like BERT to extract features and predict ICD codes, while the powerful capabilities of LLMs are expected to solve the limitations of existing methods.

3

Section 03

Technical Architecture of the PLM-ICD Baseline Method

The technical architecture of PLM-ICD includes:

  1. Text Encoding Layer: Uses BERT or medical domain variants (e.g., ClinicalBERT, BioBERT) as encoders to learn semantic representations of medical terms;
  2. Label-Aware Attention Mechanism: For multi-label classification tasks, learns specific attention vectors for each ICD code to extract relevant information;
  3. Hierarchical Code Structure Utilization: Leverages the hierarchical structure of ICD (e.g., A00→A00.0) to ensure reasonable code combinations through hierarchical classification.
4

Section 04

Technical Advantages of Large Language Models Over PLM-ICD

Medical LLMs (e.g., Med-PaLM, Meditron) have three major advantages over PLM-ICD:

  1. Extended Context Understanding: Supports longer token inputs (e.g., 4096+), enabling complete processing of long clinical records and capturing cross-paragraph associations;
  2. Rich Medical Knowledge Reserve: Pre-training covers massive medical literature and guidelines, enabling understanding of deep knowledge such as disease associations and diagnostic criteria;
  3. Generative Reasoning Capability: Can generate coding explanations, confidence notes, and even interactive clarification of issues, improving user experience.
5

Section 05

Experimental Design and Dataset Description

The experiment uses the MIMIC-III/IV dataset (de-identified intensive care unit records and ICD code annotations). The evaluation protocol includes:

  • Time-sensitive data partitioning (training/validation/test separated in chronological order);
  • Performance reporting on the test set after hyperparameter tuning on the validation set;
  • Significance tests to verify performance improvements;
  • Error analysis to identify failure modes. The comparison models cover medical LLMs of different scales (7B-70B parameters) and training strategies (pre-training, instruction fine-tuning, etc.).
6

Section 06

Expected Outcomes and Clinical Application Value

Expected outcomes include:

  1. Technical Contribution: Establish performance benchmarks for medical LLMs in ICD coding tasks, revealing advantages and limitations;
  2. Practical Guide: Assist medical institutions in evaluating and selecting coding solutions, covering model selection, deployment costs, etc.;
  3. Open-Source Contribution: Publicize code, models, and experimental records to promote community collaboration and reproduction. These outcomes will drive the progress of medical AI coding technology, benefiting medical institutions and patients.
7

Section 07

Challenges and Future Development Directions

Applying LLMs to ICD coding faces challenges:

  1. Computational Resource Requirements: Inference costs are higher than PLMs; need to explore model compression, knowledge distillation, etc., to reduce overhead;
  2. Coding Consistency Assurance: Need to combine rule engines to ensure codes comply with ICD rules (e.g., code pairing/mutual exclusion);
  3. Continuous Learning and Adaptation: Need to respond to medical knowledge updates and ICD version revisions (e.g., ICD-9→10→11) to achieve rapid system adaptation. Future research will focus on these directions.