Zing Forum

Reading

Machine Learning-Based Lunar Phase Visibility Prediction: An Astronomical Data Science Practice

This project uses machine learning methods to analyze lunar phase visibility data and make predictions by combining astronomical and geographical features. It covers the complete data science workflow: data cleaning, feature engineering, geographic visualization, model training and evaluation, and compares the performance of two algorithms—logistic regression and random forest.

机器学习天文学月相预测数据科学Python随机森林逻辑回归地理可视化特征工程分类模型
Published 2026-06-02 07:14Recent activity 2026-06-02 07:23Estimated read 7 min
Machine Learning-Based Lunar Phase Visibility Prediction: An Astronomical Data Science Practice
1

Section 01

[Introduction] Core Overview of the Machine Learning-Based Lunar Phase Visibility Prediction Project

The project is titled Machine Learning-Based Lunar Phase Visibility Prediction: An Astronomical Data Science Practice, developed by Miled Trabelssi (GitHub username: master291004), a computer engineering student. The source code is available on GitHub (link: https://github.com/master291004/crescent-visibility-analysis). The core of the project is to use machine learning methods (logistic regression, random forest) to analyze lunar phase visibility data and make predictions by combining astronomical and geographical features. It covers the complete data science workflow: data cleaning, feature engineering, geographic visualization, model training and evaluation, and compares the performance of the two algorithms.

2

Section 02

Project Background and Significance

Predicting crescent visibility is a classic challenge in astronomy and observation, with important applications in calendar determination (e.g., the start of months in the Islamic calendar) and observational astronomy. Its visibility is influenced by multiple complex factors such as astronomical parameters, geographical location, and atmospheric conditions, making accurate prediction difficult. This project transforms the traditional astronomical problem into a machine learning task, aiming to understand key influencing factors and build prediction models by analyzing historical observation data.

3

Section 03

Dataset and Feature Engineering

The project uses a dataset containing historical records from multiple observation points around the world. Features include geographical location (latitude, longitude), lunar phase geometric parameters (crescent width, viewing arc, illumination arc), astronomical measurements (azimuth difference, altitude angle), time parameters (time difference between sunset and moon set), and target labels (visibility: 0 for invisible /1 for visible). The preprocessing workflow includes missing value removal, time feature conversion (string to numerical time difference), label encoding, and feature renaming. All steps are encapsulated in the src/data_processing.py module to ensure reproducibility.

4

Section 04

Data Visualization and Exploratory Analysis

The project uses rich visual analysis to understand data patterns: statistical visualization includes category distribution charts (to identify sample imbalance), numerical feature histograms (to detect anomalies and skewness), and correlation heatmaps (to assist feature selection); geospatial visualization includes global visibility distribution maps and classification maps of visible/invisible observation points. All visualization results are saved in the results/figures/ directory.

5

Section 05

Machine Learning Models and Evaluation

The project compares two supervised classification models: logistic regression (a baseline linear model with strong interpretability) and random forest (a nonlinear model that captures complex interactive relationships and has good robustness). Training uses an 80/20 data split, and evaluation metrics include accuracy, precision, recall, F1 score, confusion matrix, ROC curve, and AUC. Results show that random forest outperforms logistic regression, and feature importance analysis provides data-driven guidance for observation practices.

6

Section 06

Project Structure and Code Organization

The project adopts a standard data science structure: the crescent-visibility-analysis/ directory includes subdirectories such as data (raw/processed data), notebooks (exploratory analysis/modeling experiments), src (reusable modules: data processing, visualization, model training, evaluation), and results (charts/metrics). The code is modularized by function, following best practices for easy understanding and reproducibility.

7

Section 07

Potential Improvement Directions

Future expansion directions for the project include: hyperparameter optimization (GridSearch/RandomizedSearch), K-fold cross-validation, trying algorithms like XGBoost/SVM, integrating SHAP values to explain model decisions, and exploring time-series/seasonal patterns.

8

Section 08

Summary and Insights

This project is an excellent example of student data science, demonstrating the combination of domain knowledge (astronomy) and machine learning technology. Highlights include a complete end-to-end workflow, clear code organization, rich visualization, rigorous evaluation, and good documentation. It serves as a reference example for data science beginners and provides data-driven ideas for solving traditional problems for astronomy enthusiasts.