Zing Forum

Reading

ANEP: A Deterministic Hybrid Framework for Name Extraction from News Videos — Interpretation of a Paper Accepted by IEEE CAI 2026

The research team from the University of Malta proposed ANEP (Accurate Name Extraction Pipeline), a modular and interpretable name extraction framework that combines YOLOv12 object detection, OCR, and NER technologies to automatically extract names from news video subtitles. Compared to black-box generative models, ANEP provides full-link traceability and has significant advantages in transparency and auditability.

计算机视觉新闻视频分析人名提取YOLOv12OCR命名实体识别IEEE CAI可解释AI目标检测多模态
Published 2026-05-02 07:44Recent activity 2026-05-02 09:43Estimated read 6 min
ANEP: A Deterministic Hybrid Framework for Name Extraction from News Videos — Interpretation of a Paper Accepted by IEEE CAI 2026
1

Section 01

ANEP Framework Introduction: A Deterministic Hybrid Solution for Name Extraction from News Videos

The research team from the University of Malta proposed ANEP (Accurate Name Extraction Pipeline), a modular and interpretable name extraction framework that combines YOLOv12 object detection, OCR, and NER technologies to automatically extract names from news video subtitles. Compared to black-box generative models, ANEP provides full-link traceability and has significant advantages in transparency and auditability. This achievement has been accepted by the IEEE Conference on Artificial Intelligence (CAI 2026) and won the Best Graduation Project Award of the Department of Artificial Intelligence at the University of Malta in 2025.

2

Section 02

Research Background and Problems: Challenges in Name Extraction from News Videos

With the integration of short videos and traditional radio and television, video news content has grown explosively. Key information is often presented in graphic overlays, but styles such as fonts, colors, and positions are diverse, making manual indexing impractical. User research shows that 59% of respondents find it difficult to see names in fast-paced news, affecting experience as well as content archiving, retrieval, and fact-checking. Existing generative multimodal models extract information end-to-end, but their black-box nature makes errors hard to trace, and their lack of interpretability in the news field is a serious flaw.

3

Section 03

ANEP Framework: Modular Architecture and Core Technical Components

The core concept of ANEP is "deterministic transparency", using a four-stage pipeline: 1. News Graphic Detection (fine-tuned based on YOLOv12 on the self-built NGD dataset; YOLOv12-medium achieves 95.8% mAP@0.5); 2. Optical Character Recognition (adaptive preprocessing to handle issues like noise and blurriness); 3. Named Entity Recognition (Transformer-based NER, supporting zero-shot multilingual); 4. Name Clustering and Timeline Generation (merge different name variants of the same person and generate a structured timeline with timestamps).

4

Section 04

ANEP vs Generative Models: Comparison of Performance and Interpretability

The research team compared ANEP with Gemini 1.5 and LLaMA 4 Maverick: Gemini 1.5 leads with an F1 score of 84.18%, but its black-box nature makes error tracing impossible; ANEP has an F1 score of 77.08%, with a balanced precision of 79.9% and recall of 74.44%, and avoids the common hallucination problem of generative models, which better meets the news field's demand of "prefer underreporting over misreporting".

5

Section 05

NGD Dataset Contribution and Practical Deployment Scenarios

ANEP built the News Graphic Dataset (NGD), with manual annotations covering various source styles such as traditional TV stations and social media native content, which has been open on the Roboflow platform. For deployment, it provides a Web interface (upload videos, view and export results) and API interfaces, supporting both local (to meet privacy needs) and cloud (to handle large-scale processing) modes.

6

Section 06

Limitations and Future Research Directions

Currently, ANEP mainly supports the Python programming language, and its multilingual support and adaptability to complex graphic styles need to be improved. Future directions include: expanding support for multilingual news content, introducing temporal information to improve the accuracy of name association, exploring hybrid architectures with generative models (balancing interpretability and accuracy), and building interactive feedback mechanisms to assist manual review.

7

Section 07

Conclusion: The Value of Interpretable AI in High-Risk Fields

ANEP re-examines the design philosophy of AI systems. In high-risk fields such as news, medical care, and law, interpretability and auditability are as important as accuracy. Its modular architecture, full-link traceability, and hallucination-free features make it an ideal tool for professional media organizations and fact-checking teams, and it will play a greater role as video news grows.