Section 01
MinerU: An Open-Source Tool for Converting Complex Documents into LLM-Friendly Formats
MinerU is an open-source document parsing tool designed to solve the document structuring challenges in the LLM and Agent era. It supports multi-format inputs such as PDF, image, and DOCX, and can convert them into Markdown/JSON formats. With core features like formula recognition, table extraction, and OCR, it is an ideal preprocessing tool for building Agent workflows and RAG systems.