Zing Forum

Reading

Poster2JSON: Automatically Extract Structured Metadata from Academic Posters Using Large Language Models

The Poster2JSON project, open-sourced by the FAIR Data Hub team, uses large language models to automatically convert academic posters in PDF or image formats into structured JSON metadata, solving the challenges of digitization and semanticization of academic achievements.

学术海报大语言模型OCR元数据提取FAIR原则多模态AI科研数字化
Published 2026-05-02 03:40Recent activity 2026-05-02 03:49Estimated read 5 min
Poster2JSON: Automatically Extract Structured Metadata from Academic Posters Using Large Language Models
1

Section 01

Introduction to the Poster2JSON Project

The Poster2JSON project, open-sourced by the FAIR Data Hub team, uses large language models to automatically convert academic posters in PDF or image formats into structured JSON metadata. It solves the challenges of digitization and semanticization of academic achievements, facilitating the open sharing and reuse of academic results.

2

Section 02

Digital Dilemma of Academic Posters

Academic posters are important carriers for disseminating scientific research results, but they usually exist in PDF or high-resolution image formats, making their content difficult to be indexed by search engines, linked to knowledge graphs, or analyzed through large-scale data mining. Traditional OCR technology can only extract text and lacks semantic understanding of the poster structure, leading to mixed elements such as titles, authors, and methods, which cannot form machine-processable standardized data and seriously hinder the open sharing and reuse of academic achievements.

3

Section 03

Technical Route and Implementation of Poster2JSON

The core goal of Poster2JSON is to convert unstructured academic posters into structured JSON metadata, leveraging the visual and text understanding capabilities of multimodal large models (such as GPT-4V and Claude 3). The workflow includes: preprocessing (resolution adjustment, layout analysis), multimodal model recognition (guided by prompts to identify various components), and mapping to a predefined JSON Schema to generate standardized metadata. Compared with traditional computer vision methods, this solution does not require dedicated template training, has strong generalization, high robustness, and low maintenance costs.

4

Section 04

Application Scenarios of Poster2JSON

Poster2JSON has a wide range of application scenarios: Individuals can batch process conference posters to build searchable personal literature libraries; Conference organizers can build digital archives to support full-text retrieval and research trend analysis; At the macro level, the output JSON can be imported into knowledge graphs, linked to databases such as papers and patents, facilitating research policy formulation, scientific research evaluation, and technology transfer.

5

Section 05

Open Source Ecosystem and FAIR Principles

As a FAIR Data Hub project, Poster2JSON adheres to the FAIR data management principles (Findable, Accessible, Interoperable, Reusable), is released under an open-source license, and its code is hosted on GitHub. The output JSON Schema is compatible with existing academic metadata standards such as Schema.org's ScholarlyArticle and Dublin Core, lowering the threshold for downstream application development.

6

Section 06

Outlook: Intelligent Transformation of Academic Publishing

Poster2JSON represents the direction of intelligent transformation in academic publishing. In the future, more automated tools for academic content processing will emerge, promoting the full-chain digitization of scientific research outputs. Chinese research institutions and academic publishing platforms can improve the intelligence level of academic services and enhance their competitiveness in the international open science wave by introducing such tools. AI is reshaping the infrastructure for knowledge production and dissemination.