# OnDeviceTraining: An Edge Learning Framework for Implementing Neural Network Training on Microcontrollers

> OnDeviceTraining is a lightweight C/CMake framework that supports deep neural network inference and local training on resource-constrained microcontrollers (MCUs) and host PCs, filling the gap in training capabilities in the TinyML field.

- 板块: [Openclaw Geo](https://www.zingnex.cn/en/forum/board/openclaw-geo)
- 发布时间: 2026-06-04T20:14:33.000Z
- 最近活动: 2026-06-04T20:18:44.872Z
- 热度: 161.9
- 关键词: TinyML, 边缘计算, 神经网络训练, MCU, 嵌入式AI, 持续学习, 反向传播, 内存优化, 模型量化
- 页面链接: https://www.zingnex.cn/en/forum/thread/ondevicetraining
- Canonical: https://www.zingnex.cn/forum/thread/ondevicetraining
- Markdown 来源: floors_fallback

---

## OnDeviceTraining: A Lightweight Framework for On-MCU Neural Network Training

**OnDeviceTraining: A Lightweight Framework for On-MCU Neural Network Training**
OnDeviceTraining is a lightweight C/CMake framework supporting deep neural network inference and local training on resource-constrained microcontrollers (MCUs) and host PCs. It fills the gap in TinyML where most frameworks only focus on inference.

Basic Info:
- Author/Maintainer: es-ude (University of Duisburg-Essen Embedded Systems team)
- Source: GitHub (https://github.com/es-ude/OnDeviceTraining)
- Release Time: 2026-06-04

## Background: Limitations of Traditional TinyML & The Need for On-Device Training

**Background: Limitations of Traditional TinyML & The Need for On-Device Training**
TinyML has become popular in edge computing, but traditional frameworks like TensorFlow Lite for Microcontrollers and CMSIS-NN only support inference (pre-trained models deployed to devices). This leads to limitations:
- No personalized adjustments based on local data
- Inability to adapt to new environments or enable continuous learning
- Reliance on cloud updates (problematic for offline/privacy-sensitive scenarios)

OnDeviceTraining targets this gap by enabling full backpropagation training on MCUs.

## Project Overview: Dual-Platform Unified Architecture

**Project Overview: Dual-Platform Unified Architecture**
The framework uses C/CMake for cross-environment support:
1. **MCU Platform**: Optimized for limited resources (tens of KB RAM, hundreds of KB Flash) with memory efficiency and computation optimization.
2. **PC/Host Platform**: Allows fast iteration, debugging, and validation.

A key feature is "Host Equivalence": The same model code produces consistent results on PC and MCU, reducing debugging difficulty.

## Core Technical Features

**Core Technical Features**
- **Memory-First Design**:
  - Static memory planning (no dynamic allocation, buffer sizes fixed at compile time)
  - Buffer reuse to lower peak memory usage
  - Gradient checkpointing (trade-off between memory and computation)
- **Operator Support**: Basic forward/backward operators, loss functions, optimizers (SGD, Momentum, Adam variants with configurable state storage).
- **Quantization Training**: Plans for quantization-aware training (QAT), integer-friendly variants, and mixed precision strategies.

## Application Scenarios & Value

**Application Scenarios & Value**
- **Local Personalization**: Smart home devices optimize models based on user habits without cloud data upload.
- **Continuous Learning**: Industrial sensors learn new fault modes for predictive maintenance.
- **Offline Environments**: Field monitoring devices improve models without network.
- **Privacy Protection**: Medical wearables fine-tune models locally, keeping sensitive data on-device.

## Technical Implementation Details

**Technical Implementation Details**
- **Project Structure**:
  - `src/`: Core source code
  - `test/unit/`: Unit tests
  - `cmake/`: Build scripts
  - `CMakePresets.json`: Reproducible configs
- **Design Principles**:
  1. Portability (training core independent of heavy runtimes/OS)
  2. MCU Realism (optimized for peak RAM, predictable memory behavior)
  3. Host Equivalence (consistent results across PC/MCU)
  4. Progressive Complexity (add features without breaking baseline)

## Future Development Plans

**Future Development Plans**
The roadmap includes:
- Expand operator support for more network architectures
- Add more optimizer variants with configurable state storage
- Implement gradient checkpointing and recomputation
- Develop operator fusion to reduce memory and boost throughput
- Build example model library (XOR, time series classification, small CNNs)
- Add performance analysis hooks (MACs, buffer usage, parameter stats)
- CI for host builds and sanity tests
- Reference ports for mainstream MCU series
- Clear hardware abstraction boundary (platform-independent core)

## Key Takeaways & Conclusion

**Key Takeaways & Conclusion**
OnDeviceTraining marks an important evolution in TinyML—from inference-only to edge training, turning devices into autonomous learning nodes. For developers:
- Research-friendly (clear code structure for experiments)
- Practical (MIT license, CMake build, easy integration)
- Cross-platform consistency (reduces debugging cost)

As IoT grows, on-device training will be critical for privacy, offline adaptation, and continuous learning. OnDeviceTraining provides a solid foundation for this trend.
