Zing Forum

Reading

Triagegeist: A Practical Project for Predicting Emergency Triage Severity Using Machine Learning

A machine learning project that predicts the emergency severity (ESI classification) of emergency patients based on structured clinical data, using LightGBM, XGBoost, and neural networks, achieving a Macro F1 score of 0.973 in the Kaggle competition.

machine learninghealthcareemergency departmenttriageLightGBMXGBoostclinical dataESIfeature engineering
Published 2026-06-04 11:46Recent activity 2026-06-04 11:50Estimated read 7 min
Triagegeist: A Practical Project for Predicting Emergency Triage Severity Using Machine Learning
1

Section 01

Introduction / Main Floor: Triagegeist: A Practical Project for Predicting Emergency Triage Severity Using Machine Learning

A machine learning project that predicts the emergency severity (ESI classification) of emergency patients based on structured clinical data, using LightGBM, XGBoost, and neural networks, achieving a Macro F1 score of 0.973 in the Kaggle competition.

2

Section 02

Original Author and Source

  • Original Author/Maintainer: Pyxis567
  • Source Platform: GitHub
  • Original Title: triagegeist
  • Original Link: https://github.com/Pyxis567/triagegeist
  • Publication Time: June 2026
  • Related Competition: Kaggle Triagegeist Competition

3

Section 03

Project Background and Significance

Emergency triage is one of the most critical links in hospital operations. When patients flood into the emergency department, nurses need to determine who needs immediate treatment and who can wait in a matter of minutes. Traditional triage relies on manual experience, but when facing a large number of patients, it is difficult to ensure the accuracy and consistency of judgments.

The Triagegeist project addresses this pain point by attempting to use machine learning models to assist or even replace the traditional manual triage process. This project participated in the Triagegeist competition on Kaggle, with the goal of predicting the assigned emergency severity level (ESI 1-5) based on structured clinical data collected from patients at the triage point.


4

Section 04

Introduction to the ESI Classification System

ESI (Emergency Severity Index) is a five-level triage system widely used in the field of emergency medicine in the United States:

Level Label Description
1 Resuscitation Immediate life-threatening, requires immediate rescue
2 Emergency High risk, should not wait
3 Urgent but stable Stable but requires multiple resources
4 Semi-urgent Stable, requires only one resource
5 Non-urgent Stable, no resources needed

This classification system determines the priority of patient visits and directly affects the treatment effect.


5

Section 05

Dataset Composition and Feature Engineering

The project uses a dataset containing 80,000 training records, with original data including:

  • Training Set: 80,000 labeled patient records (40 features + target variable)
  • Test Set: 20,000 unlabeled records for submission
  • Chief Complaint Text: Original free-text chief complaint of each patient
  • Medical History Records: 25 binary comorbidity markers
6

Section 06

Core Feature Groups

The original features cover various aspects of emergency triage:

  1. Vital Signs: Blood pressure, heart rate, oxygen saturation, body temperature, respiratory rate
  2. Demographics: Age, gender, insurance type
  3. Clinical Scores: NEWS2 score, GCS score, pain score
  4. Arrival Context: Arrival method, time, shift
  5. Past Utilization: Number of emergency visits and hospitalizations in the past 12 months
7

Section 07

Highlights of Feature Engineering

The project constructed 297 features, demonstrating solid feature engineering capabilities:

  • Missing Value Indicator: Created missing markers for key indicators such as blood pressure, respiratory rate, and body temperature, as missing values themselves are related to triage levels
  • Median Imputation: Fitted only on the training set to avoid data leakage
  • Time Features: Created markers for daytime (8-17), evening (18-22), and night periods
  • Age Binning: Divided age into 8 groups (infant to elderly) and performed one-hot encoding
  • Vital Sign Interactions: Derived features such as pulse pressure ratio, product of MAP and heart rate
  • Comorbidity Burden: Sum of 25 medical history markers
  • Past Utilization Ratio: Number of admissions/(number of emergency visits +1)
  • NEWS2 Risk Markers: High NEWS2 score (≥7), medium NEWS2 score (5-6)
  • qSOFA Score: Simplified score for sepsis screening
  • Cross-score Interactions: Pain × NEWS2, GCS × NEWS2, comorbidity × NEWS2, etc.
  • Age-stratified Features: Child/elderly markers, PALS-adjusted heart/respiratory thresholds
  • TF-IDF Text Features: Extracted 100 unigram + bigram features from chief complaint text

8

Section 08

Model Architecture and Experimental Results

The project tried multiple models, all of which used 5-fold stratified cross-validation and were retrained on the complete training set to generate test predictions.