Zing Forum

Reading

MinerU: An Open-Source Tool for Converting Complex Documents into LLM-Friendly Formats

This article introduces MinerU, an open-source document parsing tool that converts complex documents like PDFs, images, and DOCX files into machine-readable Markdown and JSON formats. It supports formula recognition, table extraction, OCR, and other features, making it an ideal data preprocessing tool for building Agent workflows.

文档解析PDFOCRMarkdownLLMAgent表格识别公式识别
Published 2026-03-31 01:44Recent activity 2026-03-31 01:53Estimated read 5 min
MinerU: An Open-Source Tool for Converting Complex Documents into LLM-Friendly Formats
1

Section 01

MinerU: An Open-Source Tool for Converting Complex Documents into LLM-Friendly Formats

MinerU is an open-source document parsing tool designed to solve the document structuring challenges in the LLM and Agent era. It supports multi-format inputs such as PDF, image, and DOCX, and can convert them into Markdown/JSON formats. With core features like formula recognition, table extraction, and OCR, it is an ideal preprocessing tool for building Agent workflows and RAG systems.

2

Section 02

Project Origin and Background

MinerU was born during the pre-training process of the InternLM large model, focusing on solving the symbol conversion problem of scientific literature. Compared to commercial products, its open-source nature and rapid iteration make it an important player in the document parsing field.

3

Section 03

Comprehensive Core Features

  • Multi-format input: Supports PDF, image, and DOCX; v3.0's native DOCX parsing speed is dozens of times faster;
  • Intelligent content extraction: Automatically removes redundant elements such as headers and footers, outputs content in reading order, and preserves structural hierarchy;
  • Formula and table recognition: Converts formulas to LaTeX and tables to HTML; supports images/formulas in tables and inter-line formula numbering;
  • OCR and multi-language support: Supports 109 languages; added vertical text and seal recognition.
4

Section 04

Technical Architecture and Performance Upgrade

  • Dual-backend design: Pipeline backend (CPU-supported, requires 4GB VRAM, OmniDocBench score: 86.2); VLM backend (accuracy over 90 points, requires 8GB+ VRAM);
  • v3.0 upgrade highlights: Architecture optimization (Pipeline accuracy exceeds the previous VLM), API/CLI orchestration, asynchronous tasks, multi-GPU deployment, memory optimization, thread safety;
  • License cleanup: Removed AGPLv3 and CC-BY-NC-SA models, making licenses more friendly.
5

Section 05

Deployment and Usage Guide

  • Installation: pip installation (uv pip install -U mineru[all]), source code installation, Docker deployment;
  • Usage methods: CLI (GPU/CPU mode), FastAPI, Gradio WebUI (official online version, ModelScope, HuggingFace Spaces).
6

Section 06

Application Scenarios and Value

  • RAG systems: Convert PDF libraries into structured Markdown to improve retrieval accuracy;
  • Training data preparation: Batch process academic papers/reports and output clean training text;
  • Agent workflows: JSON output is suitable for integration; API calls support real-time document parsing.
7

Section 07

Current Limitations and Future Directions

Limitations: Reading order may be incorrect for extremely complex layouts; limited support for vertical text; no code block support; poor parsing of special formats (comics/textbooks); incorrect row/column recognition in complex tables; Future: The team will continue to improve; community feedback via GitHub is welcome.

8

Section 08

Conclusion: Evolution from Tool to Infrastructure

MinerU is evolving from an independent tool to a large-scale document parsing infrastructure. v3.0 reduces resource consumption while maintaining high accuracy, supporting multi-GPU deployment and load balancing. For LLM application developers, it is an open-source tool worth trying. The project uses the AGPLv3 license, and related papers such as MinerU-Diffusion have been published.