The core goal of poster2json is to convert scientific posters into structured JSON data that complies with the poster-json-schema standard, which is based on the widely adopted DataCite 4.7 metadata specification. The project uses a multi-model collaboration technical architecture, selecting the most suitable model for different types of input and extraction tasks.
For JSON structuring tasks, the project uses a specially fine-tuned Llama-3.1-8B-Poster-Extraction model. This model has been specifically trained on academic poster corpora, enabling it to understand the organizational structure of academic content and organize extracted text information into compliant JSON objects.
For image-format posters, the project uses the Qwen2-VL-7B vision-language model for OCR recognition. This model has strong visual understanding capabilities, allowing it to handle complex mixed text-image layouts in posters and accurately identify text areas and extract content.
For PDF-format posters, the project uses the pdfalto tool for layout-aware text extraction, which preserves the document's structural information instead of simply outputting plain text. This multi-stage, multi-model processing flow ensures high-quality extraction results under various input conditions.