Zing Forum

Reading

LightVLMInvoice: A Privacy-Friendly Invoice Information Extraction System Based on Local Visual Large Language Models

A fully locally deployed visual large language model solution that helps small and medium-sized enterprises (SMEs) and individual developers extract structured information from complex invoices and documents while protecting sensitive financial data.

视觉大语言模型VLM发票识别OCR本地部署隐私保护文档结构化FastAPICeleryDocker
Published 2026-06-01 01:44Recent activity 2026-06-01 01:50Estimated read 5 min
LightVLMInvoice: A Privacy-Friendly Invoice Information Extraction System Based on Local Visual Large Language Models
1

Section 01

LightVLMInvoice: Local VLM-Powered Privacy-Friendly Invoice Extraction

LightVLMInvoice is a fully local-deployed visual large language model (VLM) solution for structured invoice and document information extraction. Its core design prioritizes privacy—all data processing stays local, avoiding sensitive financial data uploads to third-party servers. Key benefits include handling complex docs (multi-page PDFs, scanned images), cost-effectiveness compared to cloud OCR, and easy deployment via Docker. This project addresses the needs of SMEs, developers, and privacy-sensitive users.

2

Section 02

Background & Pain Points of Current Solutions

In digital office environments, automated invoice processing is critical for efficiency, but many users face dilemmas: cloud OCR services are costly and risk privacy leaks (sensitive data leaves local systems). Traditional OCR relies on external APIs, unsuitable for confidential or compliance-heavy scenarios. Batch processing with cloud services also incurs long-term costs. LightVLMInvoice was created to offer a local, privacy-first alternative.

3

Section 03

Technical Architecture Deep Dive

Frontend: React + Vite + TypeScript + TailwindCSS, served via Nginx in production. Backend: FastAPI (high-performance async web framework). Async Queue: Celery + Redis for non-blocking task handling (users get task IDs and poll for status instead of waiting). Model Engine: vLLM (optimized for large models) with default model cyankiwi/Qwen3.5-2B-AWQ-BF16-INT8 (quantized, low memory requirement). Output Repair: json_repair library fixes VLM's occasional JSON format errors (e.g., missing quotes, invalid numbers).

4

Section 04

Core Functional Features

  1. Complex Doc Support: Parses multi-page PDFs and scanned images, splitting into pages for batch processing. 2. Async Workflow: Users get task IDs immediately; frontend polls for status (no UI blocking). 3. JSON Recovery: Auto-fixes VLM's output issues to ensure valid structured data. 4. Fully Local: All processing stays on the user's machine—no data leaves the local environment.
5

Section 05

Deployment & Usage Guide

Env Requirements: Docker + Docker Compose, NVIDIA GPU + NVIDIA Container Toolkit (for GPU acceleration). Quick Start: Clone repo → cd to docker directory → run docker-compose up -d --build. Access: Frontend at http://localhost:8002; backend API docs at http://localhost:8005/docs. Tuning: Adjust env vars like CELERY_CONCURRENCY (default 1, 16GB+显存 can set higher), VLLM_MODEL (replace with compatible models), etc.

6

Section 06

Application Scenarios & Value

Ideal for: 1. SMEs: Batch invoice processing without cloud costs/privacy risks. 2. Accounting Firms: Handle client sensitive financial data locally. 3. Developers: Learn VLM application development (complete reference). 4. Privacy-Sensitive Industries: Medical, legal, finance (strict data privacy rules).

7

Section 07

Future Plans & License

Future: Add synthetic sample invoices, automated tests (PDF split, JSON repair), stricter output validation, improved frontend batch status, GPU-specific config docs, versioned releases. License: MIT (free to use, modify, distribute).