Zing Forum

Reading

From Physics to Data Science: A Cross-Disciplinary Developer's Machine Learning Practical Portfolio

Machine learning project collection by Mexican physicist Luis Gerardo Ramírez Archundia, covering practical cases such as SQL data analysis, time series prediction, coffee shop sales analysis, and economic indicator clustering, demonstrating how to combine physics thinking with data science methods.

machine learningdata scienceportfoliophysicstime seriesclusteringSQLPython数据分析机器学习
Published 2026-06-08 12:10Recent activity 2026-06-08 12:18Estimated read 6 min
From Physics to Data Science: A Cross-Disciplinary Developer's Machine Learning Practical Portfolio
1

Section 01

[Introduction] Machine Learning Practical Portfolio of a Cross-Disciplinary Developer from Physics

Machine learning project collection by Mexican physicist Luis Gerardo Ramírez Archundia, covering practical cases such as SQL data analysis, time series prediction, coffee shop sales analysis, and economic indicator clustering, demonstrating the combination of physics thinking and data science methods. The project is open-sourced on GitHub and continuously maintained, providing practical references for learners.

2

Section 02

Project Background and Author Introduction

Author Luis is a Mexican physics graduate with research experience in quantum chromodynamics and deep expertise in machine learning. His interdisciplinary background brings a unique perspective. His project collection covers a complete tech stack from exploratory analysis to deep learning, with each project including detailed documentation (problem definition, methodology, result analysis).

3

Section 03

Methodology of Core Projects

  1. SQL Supermarket Analysis: Multi-table join queries, GROUP BY/subqueries/window functions, data aggregation and query optimization;
  2. Coffee Sales Time Series Prediction: Data preprocessing (missing value handling, date conversion), feature engineering (time feature extraction, one-hot encoding), linear regression model (80-20 time series split;
  3. Coffee Shop Sales Analysis: Analysis of revenue structure and time patterns based on over 149k transaction data;
  4. Economic Indicator Clustering: Analysis of 11 indicators from 96 countries using multiple algorithms such as K-Means, hierarchical clustering, and DBSCAN.
4

Section 04

Key Evidence and Results of the Projects

  1. SQL Supermarket Analysis: Reveals regional sales distribution, profit margin changes, customer purchase patterns, etc.;
  2. Coffee Time Series Prediction: MAE is only $0.48, with high accuracy, applicable for inventory planning;
  3. Coffee Shop Analysis: Coffee category accounts for 38.6% of revenue (led by Barista Espresso), tea category 28.1%; Hell's Kitchen high-end products perform best; morning peak is from 7 to 10 AM, and peak season is May-June;
  4. Economic Indicator Clustering: Countries are grouped by development level; economic indicators and environmental indicators effectively distinguish different country types.
5

Section 05

Tech Stack and Toolchain

Programming Languages: Python 3.10+, SQL; Data Processing: Pandas, NumPy; Visualization: Matplotlib, Seaborn, Plotly; Machine Learning: Scikit-learn, TensorFlow, PyTorch; Databases: MySQL, PostgreSQL; Development Environment: Jupyter Notebooks, Git & GitHub.

6

Section 06

Learning Value and Insights

The project collection demonstrates the end-to-end ML project lifecycle: data preprocessing/feature engineering, statistical analysis/hypothesis testing, model selection and tuning, time series analysis, unsupervised learning, SQL query optimization, and visualization storytelling. It provides an excellent reference template for learners transitioning from theory to practice, emphasizing systematic thinking and problem-solving abilities.

7

Section 07

Conclusion and Recommendations

Interdisciplinary backgrounds (like physics) have significant value in the data science field; systematic thinking, mathematical modeling, and a rigorous analytical attitude align with ML requirements. The project is continuously maintained, with plans to add new projects. It is recommended that machine learning learners follow this open-source resource to gain practical experience.