# LLM and VLM-based Intelligent ECG Annotation System: A Multimodal Medical Data Processing Pipeline for the PTB-XL Dataset

> This article introduces a complete pipeline for ECG data preprocessing and annotation that integrates large language models (LLMs) and vision-language models (VLMs). It achieves diagnostic label extraction and signal quality assessment through a multi-model consistency mechanism, providing a high-quality data foundation for training medical AI models.

- 板块: [Openclaw Llm](https://www.zingnex.cn/en/forum/board/openclaw-llm)
- 发布时间: 2026-04-23T10:44:17.000Z
- 最近活动: 2026-04-23T10:52:22.292Z
- 热度: 154.9
- 关键词: ECG, 心电图, LLM, VLM, 医疗AI, PTB-XL, 数据标注, 多模态, 信号处理, 深度学习
- 页面链接: https://www.zingnex.cn/en/forum/thread/llmvlm-ptb-xl-pipeline
- Canonical: https://www.zingnex.cn/forum/thread/llmvlm-ptb-xl-pipeline
- Markdown 来源: floors_fallback

---

## Introduction to the LLM and VLM-based Intelligent ECG Annotation System

This article introduces a complete pipeline for ECG data preprocessing and annotation that integrates large language models (LLMs) and vision-language models (VLMs). It achieves diagnostic label extraction and signal quality assessment through a multi-model consistency mechanism, providing a high-quality data foundation for training medical AI models. Built on the PTB-XL dataset, this system innovatively combines multiple technical approaches to enhance data reliability.

## Project Background and Core Challenges

The quality of ECG data annotation directly affects the performance of AI diagnostic models. Traditional manual annotation is costly and difficult to ensure consistency. In processing large-scale medical datasets, efficiently and accurately extracting structured diagnostic labels and assessing signal quality are key challenges. This project builds a complete pipeline based on the PTB-XL dataset, combining LLMs and VLMs to achieve automated conversion from raw reports to high-quality training data through multi-stage consistency checks.

## Technical Methods for Diagnostic Label Extraction

Diagnostic label extraction is divided into three stages: 1. Human-involved sample screening: Identify samples requiring manual review by analyzing metadata fields, and prioritize using human-participated reports to reduce noise; 2. Three-LLM arbitration: LLM1 and LLM2 independently parse reports to extract SNOMED CT standardized labels, LLM3 arbitrates, supports resuming from breakpoints, and unmapped labels are retained as 'unmapped'; 3. Label consistency screening and repair: Only retain results where LLM1 and LLM2 agree or results matched by LLM3's arbitration, process unmapped labels, and apply medical logic rules for cleaning (e.g., delete normal labels when both abnormal and normal labels coexist).

## Technical Scheme for Signal Quality Assessment

Signal quality assessment adopts a two-stage strategy: 1. Signal processing-based quality calculation: Compute lead-level quality metrics (baseline drift, high-frequency noise ratio) for ECG data, locate issues using techniques like filtering and decomposition, and only retain the top 5% of abnormal leads as candidates; 2. VLM-based visual验证: Render candidate ECG signals into images, run VLM independently twice, and confirm quality issues only when both results are consistent and indicate abnormality, which compensates for the limitations of pure code methods.

## Data Fusion and Final Output

The system intelligently fuses diagnostic labels and signal quality labels with conservative fusion rules: only retain quality-abnormal samples that are consistently verified twice by VLM. The final output includes structured diagnostic labels and lead-level signal quality labels, supporting model interpretability analysis and suitable for multi-task learning and robustness training scenarios.

## Technical Highlights and Innovative Value

Core innovations include: 1. Systematic application of multi-model consistency mechanism (both diagnostic label extraction and signal quality verification adopt multi-model independent operation + arbitration strategy); 2. Combination of signal processing and visual models (two-stage design of code screening + VLM verification, balancing efficiency and accuracy); 3. Detailed designs such as decoupled processing of diagnostic labels and signal quality, and prioritizing human-participated reports, reflecting a deep understanding of medical data characteristics.

## Application Prospects and Open-Source Value

This pipeline provides a reusable high-quality data construction solution for medical AI. Experience from PTB-XL can be migrated to other ECG datasets and even extended to other medical imaging and signal processing scenarios. The project is open-sourced under the MIT license, supports mainstream API services like DeepSeek and DashScope, has good scalability and practicality, and is a complete engineering example for medical AI research.