Zing Forum

Reading

EHRGym: A Training Sandbox for Medical AI Agents to Learn Operating Electronic Health Record Systems

EHRGym is a containerized reinforcement learning environment specifically designed for training and evaluating computer agents that can perform clinical workflows in Epic-like electronic health record (EHR) systems. It supports GRPO training and natively integrates with the TRL framework.

EHRGym医疗AI电子病历强化学习OpenEnvGRPO计算机使用智能体临床工作流合成数据
Published 2026-04-04 04:45Recent activity 2026-04-04 04:49Estimated read 5 min
EHRGym: A Training Sandbox for Medical AI Agents to Learn Operating Electronic Health Record Systems
1

Section 01

Introduction: EHRGym — A Training Sandbox for Medical AI Agents

EHRGym is a containerized reinforcement learning environment designed specifically for training computer agents that can operate Epic-like electronic health record (EHR) systems. It supports GRPO training and integration with the TRL framework, addressing core obstacles in medical AI deployment such as complex interactions with real EHRs and compliance sensitivity, while providing realistic and secure training scenarios.

2

Section 02

Core Dilemmas in Medical AI Deployment

Artificial intelligence faces challenges in translation to the medical field. The key issues are the complex interfaces of real electronic health record (EHR) systems, sensitive data, and strict compliance requirements, making it difficult for researchers to directly train and test agents. Traditional simulation solutions fail to capture details of real workflows such as multi-step decision-making and cross-module navigation.

3

Section 03

Architecture and Standards of EHRGym

It adopts a dual-service containerized design: the Next.js EHR application mimics Epic's layout and interactions (including modules like patient lists and medical record reviews), and the OpenEnv environment server implements standard interfaces such as reset()/step(). It follows OpenEnv standards to ensure ecological interoperability and has natively integrated with the TRL library to support GRPO fine-tuning.

4

Section 04

Progressive Task Design

The task library is divided into three stages: unit skills (basic navigation/filtering), single objectives (ordering medical instructions/completing documents), and multi-step workflows (full clinical processes). Each task has scoring criteria; rewards combine terminal success and process progress, while penalties are applied for invalid operations and errors.

5

Section 05

Synthetic Data Strategy: Balancing Reality and Privacy

It uses Synthea to generate synthetic medical records in FHIR format (zero privacy risk, scalable and controllable). It adopts standard encodings like LOINC/SNOMED CT/RxNorm to ensure authenticity, and medical record documents are generated based on structured templates.

6

Section 06

Technical Implementation Details

The action space includes low-level mouse and keyboard operations as well as high-level semantic actions. The observation space includes target text, screenshots, routing, etc. The reward design follows sparse terminal rewards, dense process rewards, and penalty mechanisms.

7

Section 07

Application Scenarios and Potential Impact

It can be used for clinical decision support (assisting information extraction and decision-making), interface optimization (analyzing agent behavior to improve design), medical education (virtual training), and multi-modal AI (extending support for data like medical images).

8

Section 08

Limitations and Future Outlook

Current non-goals: Not a pixel-perfect clone of Epic, no full enterprise EHR functions. Future directions: Expand clinical scenarios, integrate medical knowledge bases, enable multi-agent collaboration, and introduce time/resource constraints to simulate real environments.