Zing Forum

Reading

Decoding Emotions from Sleep Brain Waves: A Three-Stage Deep Learning Framework for Cross-Subject Machine Learning

This article introduces an open-source project that won third place in the Kaggle Sleep Emotion Prediction Competition. The project achieves the goal of decoding emotional memory activation states from NREM sleep EEG signals through a three-stage deep learning process involving self-supervised pre-training, meta-learning, and spatiotemporal fusion.

EEG睡眠情绪解码机器学习深度学习自监督学习MAML元学习脑机接口神经科学
Published 2026-05-24 07:15Recent activity 2026-05-24 07:18Estimated read 7 min
Decoding Emotions from Sleep Brain Waves: A Three-Stage Deep Learning Framework for Cross-Subject Machine Learning
1

Section 01

Introduction / Main Floor: Decoding Emotions from Sleep Brain Waves: A Three-Stage Deep Learning Framework for Cross-Subject Machine Learning

This article introduces an open-source project that won third place in the Kaggle Sleep Emotion Prediction Competition. The project achieves the goal of decoding emotional memory activation states from NREM sleep EEG signals through a three-stage deep learning process involving self-supervised pre-training, meta-learning, and spatiotemporal fusion.

2

Section 02

Original Author and Source

  • Original Author/Maintainer: MrSa3dola
  • Source Platform: GitHub
  • Original Project Name: TMR-EEG-Emotion-Decoding
  • Original Link: https://github.com/MrSa3dola/TMR-EEG-Emotion-Decoding
  • Publication Date: May 23, 2026
  • Competition Result: Third place in the Kaggle "Predicting Emotions During Sleep Using Brain Waves" competition (private leaderboard)

3

Section 03

Project Background and Research Significance

Emotional memory consolidation during sleep is an important topic in neuroscience. In recent years, Targeted Memory Reactivation (TMR) technology can selectively enhance the consolidation of specific memories by presenting cues (such as sounds or odors) related to previous learning during sleep. However, how to real-time monitor changes in emotional states during sleep has long been a technical bottleneck in this field.

Traditional EEG analysis methods often rely on manual feature extraction and single-subject modeling, making it difficult to achieve cross-subject generalization. To address this challenge, this project developed an end-to-end deep learning process that can real-time predict emotional vs. neutral states from 16-channel sleep EEG signals with a temporal resolution of 200 samples per second. The project achieved third place in the Kaggle Sleep Emotion Prediction Competition, demonstrating the effectiveness of its method.


4

Section 04

Task Definition

The model needs to output the emotional probability P(emotional) for each of the 200 time points (1 second, 200Hz sampling) in a trial. The evaluation metric uses a composite AUC, rewarding prediction windows that consistently outperform random levels.

5

Section 05

Data Composition

  • Training Set: 14 subjects with emotional/neutral labels
  • Test Set: 3 subjects without labels
  • Signal Dimensions: 16 channels × 200 time points × 5 frequency bands (preprocessed)

The core challenge of this task is the extremely low signal-to-noise ratio (SNR) in individual trials—individual trials are dominated by noise, and the separation between emotional and neutral states is only faintly visible in the theta power after trial averaging. This requires the model to have strong feature extraction capabilities and cross-subject generalization ability.


6

Section 06

Three-Stage Deep Learning Architecture

This project adopts an innovative three-stage training strategy that combines the ideas of self-supervised learning, meta-learning, and multi-stream fusion.

7

Section 07

Stage 1: Self-Supervised Pre-Training (Masked EEG Autoencoder)

Stage 1 uses a Masked Autoencoder (MAE) for self-supervised pre-training on all data (training + test, no labels required). The model learns temporal representations of EEG by reconstructing randomly masked time points.

Key Designs:

  • Bidirectional Masking: The Transformer can see the context before and after the masked positions
  • 40% Masking Rate: Aggressive masking forces the model to learn meaningful temporal patterns
  • Shared Architecture: The MAE encoder has the same architecture as the subsequent channel expert, allowing direct weight transfer

The input sequence for each channel (200 time points × 5 frequency bands) undergoes linear projection (5→16 dimensions), positional encoding, and 40% random masking, then is fed into a 3-layer Transformer encoder (d=16, 4 heads, FFN=128), and finally the masked positions are reconstructed via a linear layer.

8

Section 08

Stage 2: Meta-Learning for Training Channel Experts (FOMAML)

Stage 2 trains 16 independent Transformer experts for each EEG channel. The expert networks are initialized from the MAE encoder weights and then meta-trained via First-Order MAML (FOMAML) to enable rapid adaptation to new subjects.

FOMAML Mechanism:

  • Inner Loop: 3 gradient updates to adapt to a specific subject
  • Outer Loop: Optimize meta-parameters across all subjects
  • LOPO (Leave-One-Subject-Out Cross-Validation) for Cross-Subject Generalization

Each channel expert contains approximately 19,489 parameters, totaling around 311K parameters for 16 experts. This channel-level expert design allows the model to capture specific patterns across different brain regions.