Section 01
DigitalRegistrar Project Introduction: LLM-Driven Structured Data Extraction from Pathology Reports
This article introduces the DigitalRegistrar project, a medical AI data processing pipeline that uses large language models (LLMs) to process pathology reports, automatically extract structured information, and convert it into JSON format. Maintained by kblab2024, the project is open-sourced on GitHub (link: https://github.com/kblab2024/digitalregistrar) and was released on 2026-05-23. It aims to address pain points in the medical field caused by unstructured pathology reports, such as difficulty in information retrieval and limited data analysis, by transforming unstructured data into computable structured data to empower scenarios like clinical decision-making, research acceleration, and quality control.