Zing Forum

Reading

Email Triage OpenEnv: A Reinforcement Learning Environment for Training AI Agents to Handle Real Customer Service Emails

Introducing Email Triage OpenEnv — an open-source reinforcement learning environment for training and evaluating AI agents on tasks like customer service email classification, response, escalation, etc., featuring three levels of difficulty tasks and a refined reward mechanism.

Email TriageOpenEnvAI 代理强化学习客服自动化邮件分类LLM 评估基准测试
Published 2026-03-31 06:45Recent activity 2026-03-31 06:53Estimated read 9 min
Email Triage OpenEnv: A Reinforcement Learning Environment for Training AI Agents to Handle Real Customer Service Emails
1

Section 01

Introduction / Main Post: Email Triage OpenEnv: A Reinforcement Learning Environment for Training AI Agents to Handle Real Customer Service Emails

Introducing Email Triage OpenEnv — an open-source reinforcement learning environment for training and evaluating AI agents on tasks like customer service email classification, response, escalation, etc., featuring three levels of difficulty tasks and a refined reward mechanism.

2

Section 02

Background: Real-World Capability Testing of AI Agents

Current large language model (LLM) benchmarks mostly focus on 'academic' tasks such as knowledge Q&A, code generation, and mathematical reasoning. However, when it comes to deploying AI agents in real work scenarios, these tests often fail to reflect the complexity of the real world.

Customer service email handling is a typical example. This work requires:

  • Understanding the urgency and business type of emails
  • Distinguishing between real security alerts and phishing emails
  • Responding to customers in an appropriate tone
  • Routing issues to the correct team
  • Maintaining context coherence across multiple related emails

These tasks seem simple, but they involve multi-step decision-making, context understanding, and complex state management. More importantly, the cost of mistakes is high: marking important emails as spam may lead to customer churn, while failing to identify phishing emails may pose security risks.

3

Section 03

Introduction to Email Triage OpenEnv

Email Triage OpenEnv is an open-source reinforcement learning environment specifically designed for training and evaluating AI agents' ability to handle customer service emails. It simulates a real inbox where agents need to perform tasks like email classification, response, escalation, etc., through a series of actions.

This project is part of the OpenEnv ecosystem, which is a set of open environment standards for evaluating AI agents' performance in real-world tasks.

4

Section 04

Observation Space

At each step, the agent can observe the following information:

  1. inbox_summary: Inbox overview containing metadata of all emails

    • Email ID, subject, sender, timestamp
    • Read status, priority label, category label
    • Whether archived, marked as spam, or escalated
    • Whether replied to
  2. current_email: Full content of the currently focused email

    • Email body, thread ID, attachment list
  3. inbox_stats: Inbox statistics

  4. task_objective: Human-readable description of the current task's goal

  5. last_action_result: Feedback from the previous action

  6. available_actions: List of currently available actions

5

Section 05

Action Space

The environment defines 9 core actions:

Action Parameters Description
focus email_id Read the specified email
classify priority, category Mark email priority and category
reply body, tone Send a reply (supports formal/friendly/apologetic/escalating tones)
escalate escalate_to, note Escalate to the specified team (manager/legal/technical_team/billing_team)
flag_spam confidence Mark as spam (heavy penalty for false positives)
archive reason Archive the email (resolved/irrelevant/spam)
mark_read Mark as read
snooze duration_hours Postpone processing
noop Do nothing (small penalty)
6

Section 06

Three Levels of Difficulty Tasks

The environment provides three progressive difficulty tasks:

Task 1: Basic Triage

  • Inbox size: 10 emails
  • Max steps: 30
  • Goal: Read and classify all emails (priority + category)
  • Scoring criteria: Classification accuracy
  • Expected score: 0.4 - 0.9
  • Key challenge: Distinguish urgent/high/normal priorities from email tone

Task 2: Reply and Escalate

  • Inbox size: 15 emails (including Task 1 emails)
  • Max steps: 50
  • Goal: Classify all emails, reply to customer inquiries, escalate key issues, mark/archive spam
  • Scoring criteria: Classification (40%) + Reply quality (30%) + Escalation routing (20%) + Spam detection (10%)
  • Expected score: 0.3 - 0.75
  • Key challenge: Identify correct escalation targets and reply tones

Task 3: Full Workflow

  • Inbox size: 20 emails (including Task 1 and 2 emails)
  • Max steps: 80
  • Goal: On top of Task 2, handle traps and maintain thread continuity
  • Scoring criteria: Task 2 score (70%) + Trap handling (15%) + Thread continuity (10%) + Multi-action completeness (5%)
  • Expected score: 0.2 - 0.65
  • Key traps:
    • t3_e16: Legitimate security alert (from security-noreply@ourcompany-platform.com); false spam marking deducts 0.30 points
    • t3_e17: Phishing email pretending to be internal IT message (from .ru domain); must be marked correctly
    • t3_e18: Follow-up email of a previous billing dispute; needs both escalation and reply
    • t3_e20: Server failure; needs escalation to both technical_team and manager
7

Section 07

Reward Mechanism Design

The environment uses refined reward shaping, providing continuous feedback signals during the task instead of only giving a total score at the end.

8

Section 08

Positive Rewards

Event Reward
Correct priority classification +0.10
Correct category classification +0.10
Correct reply tone +0.08
Non-empty reply content +0.04
Correct escalation team +0.12
True positive spam marking +0.10
Correct archiving (spam/resolved) +0.05
Read email (focus action) +0.01