Zing Forum

Reading

Implementing Classic Machine Learning Algorithms from Scratch: Analysis of Rice University's CMOR 438 Open-Source Teaching Project

Qiushi Han, a student at Rice University, open-sourced the complete handwritten machine learning algorithm library rice_ml, which includes NumPy implementations of 14 classic algorithms, 45 unit tests, and 12 teaching notebooks, providing transparent and learnable code resources for understanding algorithm principles.

机器学习教学NumPyRice大学算法实现开源监督学习无监督学习Python教育
Published 2026-05-02 08:45Recent activity 2026-05-02 09:52Estimated read 6 min
Implementing Classic Machine Learning Algorithms from Scratch: Analysis of Rice University's CMOR 438 Open-Source Teaching Project
1

Section 01

[Introduction] Rice University's rice_ml Open-Source Project: Teaching Resources for Implementing Classic ML Algorithms from Scratch

Qiushi Han, a student in Rice University's CMOR438/INDE577 course, open-sourced the rice_ml project, which implements 14 classic machine learning algorithms from scratch using pure NumPy, including 45 unit tests and 12 teaching notebooks. The project aims to help learners deeply understand the underlying principles of algorithms (rather than just calling APIs), providing transparent and learnable code resources that balance teaching clarity and engineering practicality.

2

Section 02

Project Background and Positioning

In today's era of highly encapsulated deep learning frameworks, many developers have only a superficial understanding of the underlying algorithm principles. As a course teaching project, rice_ml's core concepts are algorithm transparency and mathematical intuition: the code directly corresponds to mathematical formulas, avoiding high-level abstractions. The project adopts modern Python package management standards (pyproject.toml), has a complete CI/CD pipeline, and 45 unit tests run automatically via GitHub Actions, reaching the level of a maintainable open-source project.

3

Section 03

Detailed Explanation of Core Algorithm Implementations

Supervised Learning Module

Covers linear regression (OLS/Ridge/gradient descent), logistic regression (sigmoid/cross-entropy), KNN, perceptron (Rosenblatt rule), MLP (backpropagation), decision tree (information gain/variance reduction), etc. Among them, the random forest achieves 100% accuracy on the Wine dataset.

Unsupervised Learning Module

Includes K-Means (Lloyd's algorithm + elbow method), DBSCAN (density clustering + noise labeling), PCA (eigenvalue decomposition + 95% variance retention), label propagation (87.5% accuracy with 20% labeled data in semi-supervised scenarios), etc.

4

Section 04

Supporting Tools and Evaluation System

The project provides a complete toolchain:

  • Preprocessing: StandardScaler/MinMaxScaler normalization, stratified sampling;
  • Evaluation metrics: Accuracy, MSE, R², confusion matrix, precision/recall;
  • Teaching notebooks: Demonstrate algorithms using real datasets (diabetes, breast cancer, wine, digit recognition, etc.), forming a systematic learning path.
5

Section 05

Teaching and Practical Value

  • Educators: Can be directly used in classrooms, supporting students to modify experiments (e.g., adjusting distance metrics, tree splitting criteria);
  • Self-learners: Bridges the gap between theory and industrial code (more friendly than scikit-learn's Cython source code);
  • Interview candidates: Provides references for handwritten algorithms (classic implementations like backpropagation, decision tree recursion, etc.).
6

Section 06

Technical Highlights and Engineering Practices

Engineering level: Adopts src directory structure, pyproject.toml (compliant with PEP517/518), pytest test coverage, and CI pipeline to ensure quality. Algorithm level: Focuses on numerical stability, such as details like gradient descent learning rate scheduling, multiple random initializations for K-Means, and PCA eigenvalue sorting.

7

Section 07

Summary and Outlook

rice_ml proves that high-quality educational code can balance teaching and engineering value. The project uses the MIT license, allowing free use and modification, and is suitable for course reference, interview review, and algorithm benchmarking. In AI education, this from-scratch building approach will help cultivate the next generation of AI talents.