Zing Forum

Reading

From Demo to Production: An HR Multi-Agent Platform with Continuous Evolution Capability

This article provides an in-depth analysis of the hr-intelligence-platform project, an HR data platform and multi-agent system designed for production environments. Breaking through the limitations of traditional demo-level agents, the project demonstrates how to safely deploy AI agent systems in sensitive business scenarios through a human-machine collaborative improvement loop, complete audit trails, and role-separated governance.

HR智能体LangGraph人机协同持续改进生产治理角色分离审计追踪RAG多智能体系统合规安全
Published 2026-06-05 01:45Recent activity 2026-06-05 01:48Estimated read 7 min
From Demo to Production: An HR Multi-Agent Platform with Continuous Evolution Capability
1

Section 01

[Introduction] From Demo to Production: Core Value and Innovations of the HR Multi-Agent Platform

The hr-intelligence-platform project analyzed in this article is an HR data platform and multi-agent system for production environments. Breaking through the limitations of traditional demo-level agents, it addresses the safe deployment of AI agents in sensitive business scenarios through a human-machine collaborative improvement loop, complete audit trails, and role-separated governance. Its core focus is answering how to enable agents to evolve continuously while ensuring controllability.

2

Section 02

Background: Pain Points and Challenges of LLM Agents from Demo to Production

Currently, most LLM applications remain in the "demo-usable" stage and are prone to vulnerabilities when facing real-world complexity. Especially in sensitive HR domains, salary query errors or data leaks can lead to compliance incidents and legal risks. This project is designed to address these pain points and aims to build a complete production-oriented system.

3

Section 03

System Architecture: Collaborative Design of Data Platform and Multi-Agent System

HR Data Platform Layer

Manages 84 third-level data categories, covering four data sources such as Feishu synchronization and manual upload, anchored by three business units as dimensions. Salary data uses a 30-minute TTL secondary verification mechanism, and permissions are bound to positions to avoid failure of separation of duties.

Multi-Agent System Layer

Based on the LangGraph framework, it uses a two-layer scheduling of "planner + supervisor": the planner is responsible for semantic intent recognition (keywords as fail-safes), and the supervisor distributes tasks to five professional agents such as parsers and retrievers, balancing flexibility and predictability.

4

Section 04

Core Innovation: Human-Machine Collaborative Improvement Loop Mechanism

Tracking and Feedback Collection

Each run generates a detailed execution trace (using query hashing to protect sensitive information), and users can provide feedback via likes/dislikes.

Automatic Review Agent

Weekly automatic cluster analysis of negative cases generates dual-view outputs: a business summary layer for HR decision-making and a technical details layer for technical fixes.

Improvement Work Orders and Test Gates

Business administrators review findings and convert them into work orders. Test gates are hard rules—modifications cannot be released if tests fail, enforcing process norms.

5

Section 05

Role Separation and Compliance Governance: Security Assurance for Sensitive Scenarios

A three-layer role system is designed:

  • Business Administrators: Have salary access permissions but are constrained by TTL; all operations are audited (behavior is recorded, not values).
  • Technical Administrators: Responsible for system operation and maintenance but cannot view salary values, achieving duty isolation.
  • General Employees: Only access relevant operational data; salary data is isolated at three levels: intent classification, return fields, and interface layer.
6

Section 06

Technical Implementation Details: RAG, Evaluation, and Tech Stack

RAG Strategy

Uses Qwen embedding + hybrid retrieval (vector + keyword) + re-ranking. It explicitly refuses to answer when there are zero hits to avoid fabrication.

Evaluation System

Three-layer metrics: intent recognition accuracy, retrieval hit rate, answer quality (LLM-as-Judge), supporting scheduled automatic and on-demand triggering.

Tech Stack

Backend: Python + FastAPI + PostgreSQL (pgvector) + Celery + LangGraph; Large Model: Qwen (embedding + dialogue); Frontend: Native HTML/JS; Deployment: Docker Compose.

7

Section 07

Design Philosophy: Key Principles for Production-Grade AI Systems

  • Semantic Routing Over Keyword Enumeration: Use LLM semantic understanding for routing, with keywords as fail-safes;
  • Position-Bound Permissions Over Fine-Grained Switches: Salary permissions are bound to roles to avoid abuse;
  • Defense in Depth and Pre-Gates: Sensitive checks are front-loaded to ensure policy consistency;
  • Review Without Automatic Fixes: The review agent only identifies problems; improvements require manual decision-making + gate verification.
8

Section 08

Conclusion: Benchmark Practice for Production-Grade AI Agents

This project provides a complete reference for AI agent deployment from demo to production. Its value lies in systematic thinking about production complexity (audit, rollback, permission isolation, controllable improvement). For enterprise LLM application developers, its improvement loop, role separation, and test gate designs are worth in-depth study, emphasizing that production systems need to be "manageable, auditable, and controllable."