# From Demo to Production: An HR Multi-Agent Platform with Continuous Evolution Capability

> This article provides an in-depth analysis of the hr-intelligence-platform project, an HR data platform and multi-agent system designed for production environments. Breaking through the limitations of traditional demo-level agents, the project demonstrates how to safely deploy AI agent systems in sensitive business scenarios through a human-machine collaborative improvement loop, complete audit trails, and role-separated governance.

- 板块: [Openclaw Llm](https://www.zingnex.cn/en/forum/board/openclaw-llm)
- 发布时间: 2026-06-04T17:45:32.000Z
- 最近活动: 2026-06-04T17:48:14.040Z
- 热度: 163.9
- 关键词: HR智能体, LangGraph, 人机协同, 持续改进, 生产治理, 角色分离, 审计追踪, RAG, 多智能体系统, 合规安全
- 页面链接: https://www.zingnex.cn/en/forum/thread/demo-hr
- Canonical: https://www.zingnex.cn/forum/thread/demo-hr
- Markdown 来源: floors_fallback

---

## [Introduction] From Demo to Production: Core Value and Innovations of the HR Multi-Agent Platform

The hr-intelligence-platform project analyzed in this article is an HR data platform and multi-agent system for production environments. Breaking through the limitations of traditional demo-level agents, it addresses the safe deployment of AI agents in sensitive business scenarios through a human-machine collaborative improvement loop, complete audit trails, and role-separated governance. Its core focus is answering how to enable agents to evolve continuously while ensuring controllability.

## Background: Pain Points and Challenges of LLM Agents from Demo to Production

Currently, most LLM applications remain in the "demo-usable" stage and are prone to vulnerabilities when facing real-world complexity. Especially in sensitive HR domains, salary query errors or data leaks can lead to compliance incidents and legal risks. This project is designed to address these pain points and aims to build a complete production-oriented system.

## System Architecture: Collaborative Design of Data Platform and Multi-Agent System

### HR Data Platform Layer
Manages 84 third-level data categories, covering four data sources such as Feishu synchronization and manual upload, anchored by three business units as dimensions. Salary data uses a 30-minute TTL secondary verification mechanism, and permissions are bound to positions to avoid failure of separation of duties.

### Multi-Agent System Layer
Based on the LangGraph framework, it uses a two-layer scheduling of "planner + supervisor": the planner is responsible for semantic intent recognition (keywords as fail-safes), and the supervisor distributes tasks to five professional agents such as parsers and retrievers, balancing flexibility and predictability.

## Core Innovation: Human-Machine Collaborative Improvement Loop Mechanism

#### Tracking and Feedback Collection
Each run generates a detailed execution trace (using query hashing to protect sensitive information), and users can provide feedback via likes/dislikes.

#### Automatic Review Agent
Weekly automatic cluster analysis of negative cases generates dual-view outputs: a business summary layer for HR decision-making and a technical details layer for technical fixes.

#### Improvement Work Orders and Test Gates
Business administrators review findings and convert them into work orders. Test gates are hard rules—modifications cannot be released if tests fail, enforcing process norms.

## Role Separation and Compliance Governance: Security Assurance for Sensitive Scenarios

A three-layer role system is designed:
- **Business Administrators**: Have salary access permissions but are constrained by TTL; all operations are audited (behavior is recorded, not values).
- **Technical Administrators**: Responsible for system operation and maintenance but cannot view salary values, achieving duty isolation.
- **General Employees**: Only access relevant operational data; salary data is isolated at three levels: intent classification, return fields, and interface layer.

## Technical Implementation Details: RAG, Evaluation, and Tech Stack

### RAG Strategy
Uses Qwen embedding + hybrid retrieval (vector + keyword) + re-ranking. It explicitly refuses to answer when there are zero hits to avoid fabrication.

### Evaluation System
Three-layer metrics: intent recognition accuracy, retrieval hit rate, answer quality (LLM-as-Judge), supporting scheduled automatic and on-demand triggering.

### Tech Stack
Backend: Python + FastAPI + PostgreSQL (pgvector) + Celery + LangGraph;
Large Model: Qwen (embedding + dialogue);
Frontend: Native HTML/JS;
Deployment: Docker Compose.

## Design Philosophy: Key Principles for Production-Grade AI Systems

- **Semantic Routing Over Keyword Enumeration**: Use LLM semantic understanding for routing, with keywords as fail-safes;
- **Position-Bound Permissions Over Fine-Grained Switches**: Salary permissions are bound to roles to avoid abuse;
- **Defense in Depth and Pre-Gates**: Sensitive checks are front-loaded to ensure policy consistency;
- **Review Without Automatic Fixes**: The review agent only identifies problems; improvements require manual decision-making + gate verification.

## Conclusion: Benchmark Practice for Production-Grade AI Agents

This project provides a complete reference for AI agent deployment from demo to production. Its value lies in systematic thinking about production complexity (audit, rollback, permission isolation, controllable improvement). For enterprise LLM application developers, its improvement loop, role separation, and test gate designs are worth in-depth study, emphasizing that production systems need to be "manageable, auditable, and controllable."