Reading

From Demo to Production: An HR Multi-Agent Platform with Continuous Evolution Capability

This article provides an in-depth analysis of the hr-intelligence-platform project, an HR data platform and multi-agent system designed for production environments. Breaking through the limitations of traditional demo-level agents, the project demonstrates how to safely deploy AI agent systems in sensitive business scenarios through a human-machine collaborative improvement loop, complete audit trails, and role-separated governance.

HR智能体LangGraph人机协同持续改进生产治理角色分离审计追踪RAG多智能体系统合规安全

Published 2026-06-05 01:45Recent activity 2026-06-05 01:48Estimated read 7 min

From Demo to Production: An HR Multi-Agent Platform with Continuous Evolution Capability

Section 01

[Introduction] From Demo to Production: Core Value and Innovations of the HR Multi-Agent Platform

The hr-intelligence-platform project analyzed in this article is an HR data platform and multi-agent system for production environments. Breaking through the limitations of traditional demo-level agents, it addresses the safe deployment of AI agents in sensitive business scenarios through a human-machine collaborative improvement loop, complete audit trails, and role-separated governance. Its core focus is answering how to enable agents to evolve continuously while ensuring controllability.

Section 02

Background: Pain Points and Challenges of LLM Agents from Demo to Production

Currently, most LLM applications remain in the "demo-usable" stage and are prone to vulnerabilities when facing real-world complexity. Especially in sensitive HR domains, salary query errors or data leaks can lead to compliance incidents and legal risks. This project is designed to address these pain points and aims to build a complete production-oriented system.

Section 03

System Architecture: Collaborative Design of Data Platform and Multi-Agent System

HR Data Platform Layer

Manages 84 third-level data categories, covering four data sources such as Feishu synchronization and manual upload, anchored by three business units as dimensions. Salary data uses a 30-minute TTL secondary verification mechanism, and permissions are bound to positions to avoid failure of separation of duties.

Multi-Agent System Layer

Based on the LangGraph framework, it uses a two-layer scheduling of "planner + supervisor": the planner is responsible for semantic intent recognition (keywords as fail-safes), and the supervisor distributes tasks to five professional agents such as parsers and retrievers, balancing flexibility and predictability.

Section 04

Core Innovation: Human-Machine Collaborative Improvement Loop Mechanism

Tracking and Feedback Collection

Each run generates a detailed execution trace (using query hashing to protect sensitive information), and users can provide feedback via likes/dislikes.

Automatic Review Agent

Weekly automatic cluster analysis of negative cases generates dual-view outputs: a business summary layer for HR decision-making and a technical details layer for technical fixes.

Improvement Work Orders and Test Gates

Business administrators review findings and convert them into work orders. Test gates are hard rules—modifications cannot be released if tests fail, enforcing process norms.

Section 05

Role Separation and Compliance Governance: Security Assurance for Sensitive Scenarios

A three-layer role system is designed:

Business Administrators: Have salary access permissions but are constrained by TTL; all operations are audited (behavior is recorded, not values).
Technical Administrators: Responsible for system operation and maintenance but cannot view salary values, achieving duty isolation.
General Employees: Only access relevant operational data; salary data is isolated at three levels: intent classification, return fields, and interface layer.

Section 06

Technical Implementation Details: RAG, Evaluation, and Tech Stack

RAG Strategy

Uses Qwen embedding + hybrid retrieval (vector + keyword) + re-ranking. It explicitly refuses to answer when there are zero hits to avoid fabrication.

Evaluation System

Three-layer metrics: intent recognition accuracy, retrieval hit rate, answer quality (LLM-as-Judge), supporting scheduled automatic and on-demand triggering.

Tech Stack

Backend: Python + FastAPI + PostgreSQL (pgvector) + Celery + LangGraph; Large Model: Qwen (embedding + dialogue); Frontend: Native HTML/JS; Deployment: Docker Compose.

Section 07

Design Philosophy: Key Principles for Production-Grade AI Systems

Semantic Routing Over Keyword Enumeration: Use LLM semantic understanding for routing, with keywords as fail-safes;
Position-Bound Permissions Over Fine-Grained Switches: Salary permissions are bound to roles to avoid abuse;
Defense in Depth and Pre-Gates: Sensitive checks are front-loaded to ensure policy consistency;
Review Without Automatic Fixes: The review agent only identifies problems; improvements require manual decision-making + gate verification.

Section 08

Conclusion: Benchmark Practice for Production-Grade AI Agents

This project provides a complete reference for AI agent deployment from demo to production. Its value lies in systematic thinking about production complexity (audit, rollback, permission isolation, controllable improvement). For enterprise LLM application developers, its improvement loop, role separation, and test gate designs are worth in-depth study, emphasizing that production systems need to be "manageable, auditable, and controllable."

Continue Reading

Keep going with more reads from the same topic.

Nornir MCP Server: An Enterprise-Grade Bridge for Integrating Large Language Models into Network Automation

Nornir MCP Server is an enterprise-level server based on the Model Context Protocol (MCP). It seamlessly integrates large language models (such as Claude) with the Nornir network automation framework, supporting natural language orchestration for multi-vendor network devices (Cisco, Arista, Juniper, etc.), and providing production-grade features like a dual-engine architecture (NAPALM + Netmiko), intelligent filtering, and a secure sandbox.

Recent activity 2026-05-06 20:51

Bibliothèque Française LLM: A French Public Domain Literature Index System Optimized for Large Language Models

Bibliothèque Française LLM is a structured indexing and annotation project for French public domain literature designed specifically for large language models (LLMs). It integrates multiple authoritative sources such as DraCor, Common Corpus, and Wikisource, providing metadata indexing categorized by genre, author, and era, as well as in-depth annotations for dramatic texts (including characters, lines, stage directions, etc.). Its aim is to enable LLMs to efficiently read and understand classic French literary works.

Recent activity 2026-05-06 20:50

Splinter: A Lock-Free Zero-Copy Shared Memory KV and Vector Storage Library That Eliminates Socket and Memcpy Overhead for LLM Inference

Splinter is a minimalist, high-performance key-value (KV) and vector storage system enabling zero-latency inter-process communication via shared memory and atomic operations. With only 766 lines of core code, it supports millions of operations per second and 768-dimensional vector storage, offering a new architectural approach for local LLM inference and data-intensive applications.

Recent activity 2026-04-03 08:49

Building an AWS Generative AI Application from Scratch: EC2 + Bedrock Hands-On Tutorial

A complete cloud-native AI application development guide for beginners, building a simple generative AI chatbot using Amazon EC2, Apache, Python CGI, and Amazon Bedrock, covering architecture design, IAM permission configuration, security best practices, and cost optimization suggestions.

Recent activity 2026-06-02 19:49