Zing Forum

Reading

LightVLMInvoice: A Purely Local Visual Large Model Document Information Extraction System Ensuring Data Privacy

An invoice/document structured information extraction system based on locally deployed VLM, using a front-end and back-end separation architecture and asynchronous task queue, supporting automatic parsing of multi-page PDFs, with all inference completed locally to ensure business data privacy and security.

LightVLMInvoice视觉大模型文档信息提取发票识别本地部署VLM隐私保护vLLMOCR结构化数据
Published 2026-04-01 12:11Recent activity 2026-04-01 12:22Estimated read 6 min
LightVLMInvoice: A Purely Local Visual Large Model Document Information Extraction System Ensuring Data Privacy
1

Section 01

[Introduction] LightVLMInvoice: Core Introduction to the Purely Local Visual Large Model Document Information Extraction System

LightVLMInvoice is a document/invoice structured information extraction system based on locally deployed Visual Large Language Models (VLM). It adopts a front-end and back-end separation + asynchronous task queue architecture, supports automatic parsing of multi-page PDFs, and all inference is completed locally. Its core design concept is "privacy first", addressing the sensitive data privacy and compliance risks brought by traditional cloud service APIs.

2

Section 02

Background: Privacy Pain Points and Needs in Enterprise Document Processing

In digital transformation, enterprises need to process massive paper/electronic documents (invoices, contracts, reports, etc.). Traditional solutions rely on cloud service APIs, and the external transmission of sensitive business data leads to non-negligible privacy and compliance risks. LightVLMInvoice, with locally deployed VLM as its core, provides a fully offline document parsing solution that balances AI efficiency and data security.

3

Section 03

System Architecture and Technical Methods

  • Front-end and Back-end Separation Architecture: Front-end uses React+Vite+TypeScript+TailwindCSS; Back-end is based on FastAPI, with Celery+Redis for asynchronous task scheduling;
  • Inference Engine: Uses vLLM to deploy local VLM (default quantized model cyankiwi/Qwen3.5-2B-AWQ-BF16-INT8, low memory usage);
  • Fault Tolerance Mechanism: Automatically fixes JSON syntax errors in model output via the json_repair library to ensure data validity.
4

Section 04

Core Features

  • Complex File Support: Fully automatic parsing of multi-page PDFs, with background automatic splitting into single pages for processing;
  • Asynchronous Non-blocking: Returns a task ID after file submission, front-end polls to get progress and results;
  • High Robustness: Includes error retry, result verification, and exception handling mechanisms;
  • Purely Local Offline: All inference and parsing are completed locally, no network dependency.
5

Section 05

Deployment and Configuration Guide

  • Environment Requirements: Docker & Docker Compose, NVIDIA GPU and corresponding Container Toolkit;
  • Quick Start: Clone the project → Enter the docker directory → Execute docker-compose up -d --build;
  • Access Addresses: Front-end http://localhost:8002, Back-end API documentation http://localhost:8005/docs;
  • Parameter Configuration: Adjust ports, concurrency (CELERY_CONCURRENCY), model parameters, etc. via the .env file.
6

Section 06

Application Scenarios

Applicable to scenarios such as financial invoice processing (extracting numbers, amounts, etc.), contract document parsing (key clauses, signatories), document information entry (ID card/business license), report data extraction (converting tables to structured format), etc.

7

Section 07

Limitations and Improvement Directions

  • Current Limitations: Dependent on NVIDIA GPU, complex table/handwriting recognition capabilities need improvement, single-node deployment;
  • Future Improvements: Integrate more open-source VLM models, support GPU pooling load balancing, optimize batch processing efficiency, add result confidence scoring.
8

Section 08

Trade-off Between Local Deployment vs Cloud Services and Conclusion

  • Local Deployment Advantages: Data privacy (no cross-domain transmission), controllable cost, low latency, offline availability;
  • Cloud Service Advantages: Maintenance-free, elastic scaling, automatic model updates;
  • Conclusion: LightVLMInvoice provides a solution that balances efficiency and privacy for enterprises concerned about data security, and is a worthy option to evaluate in open-source scenarios.