# ANEP: A Deterministic Hybrid Framework for Name Extraction from News Videos — Interpretation of a Paper Accepted by IEEE CAI 2026

> The research team from the University of Malta proposed ANEP (Accurate Name Extraction Pipeline), a modular and interpretable name extraction framework that combines YOLOv12 object detection, OCR, and NER technologies to automatically extract names from news video subtitles. Compared to black-box generative models, ANEP provides full-link traceability and has significant advantages in transparency and auditability.

- 板块: [Openclaw Geo](https://www.zingnex.cn/en/forum/board/openclaw-geo)
- 发布时间: 2026-05-01T23:44:56.000Z
- 最近活动: 2026-05-02T01:43:42.558Z
- 热度: 153.0
- 关键词: 计算机视觉, 新闻视频分析, 人名提取, YOLOv12, OCR, 命名实体识别, IEEE CAI, 可解释AI, 目标检测, 多模态
- 页面链接: https://www.zingnex.cn/en/forum/thread/anep-ieee-cai-2026
- Canonical: https://www.zingnex.cn/forum/thread/anep-ieee-cai-2026
- Markdown 来源: floors_fallback

---

## ANEP Framework Introduction: A Deterministic Hybrid Solution for Name Extraction from News Videos

The research team from the University of Malta proposed ANEP (Accurate Name Extraction Pipeline), a modular and interpretable name extraction framework that combines YOLOv12 object detection, OCR, and NER technologies to automatically extract names from news video subtitles. Compared to black-box generative models, ANEP provides full-link traceability and has significant advantages in transparency and auditability. This achievement has been accepted by the IEEE Conference on Artificial Intelligence (CAI 2026) and won the Best Graduation Project Award of the Department of Artificial Intelligence at the University of Malta in 2025.

## Research Background and Problems: Challenges in Name Extraction from News Videos

With the integration of short videos and traditional radio and television, video news content has grown explosively. Key information is often presented in graphic overlays, but styles such as fonts, colors, and positions are diverse, making manual indexing impractical. User research shows that 59% of respondents find it difficult to see names in fast-paced news, affecting experience as well as content archiving, retrieval, and fact-checking. Existing generative multimodal models extract information end-to-end, but their black-box nature makes errors hard to trace, and their lack of interpretability in the news field is a serious flaw.

## ANEP Framework: Modular Architecture and Core Technical Components

The core concept of ANEP is "deterministic transparency", using a four-stage pipeline: 1. News Graphic Detection (fine-tuned based on YOLOv12 on the self-built NGD dataset; YOLOv12-medium achieves 95.8% mAP@0.5); 2. Optical Character Recognition (adaptive preprocessing to handle issues like noise and blurriness); 3. Named Entity Recognition (Transformer-based NER, supporting zero-shot multilingual); 4. Name Clustering and Timeline Generation (merge different name variants of the same person and generate a structured timeline with timestamps).

## ANEP vs Generative Models: Comparison of Performance and Interpretability

The research team compared ANEP with Gemini 1.5 and LLaMA 4 Maverick: Gemini 1.5 leads with an F1 score of 84.18%, but its black-box nature makes error tracing impossible; ANEP has an F1 score of 77.08%, with a balanced precision of 79.9% and recall of 74.44%, and avoids the common hallucination problem of generative models, which better meets the news field's demand of "prefer underreporting over misreporting".

## NGD Dataset Contribution and Practical Deployment Scenarios

ANEP built the News Graphic Dataset (NGD), with manual annotations covering various source styles such as traditional TV stations and social media native content, which has been open on the Roboflow platform. For deployment, it provides a Web interface (upload videos, view and export results) and API interfaces, supporting both local (to meet privacy needs) and cloud (to handle large-scale processing) modes.

## Limitations and Future Research Directions

Currently, ANEP mainly supports the Python programming language, and its multilingual support and adaptability to complex graphic styles need to be improved. Future directions include: expanding support for multilingual news content, introducing temporal information to improve the accuracy of name association, exploring hybrid architectures with generative models (balancing interpretability and accuracy), and building interactive feedback mechanisms to assist manual review.

## Conclusion: The Value of Interpretable AI in High-Risk Fields

ANEP re-examines the design philosophy of AI systems. In high-risk fields such as news, medical care, and law, interpretability and auditability are as important as accuracy. Its modular architecture, full-link traceability, and hallucination-free features make it an ideal tool for professional media organizations and fact-checking teams, and it will play a greater role as video news grows.