Reading

Stellar LLM Classifier: An Intelligent Star Classification System Combining Astrophysical Rules and Large Language Models

An innovative hybrid architecture astronomical tool that combines deterministic hard computing with the AstroSage-8B large language model to enable automatic classification of Gaia DR3 star spectral types and generation of natural language descriptions.

恒星分类大语言模型天体物理Gaia DR3AstroSage-8B混合计算光谱类型机器学习天文AI自然语言生成

Published 2026-06-08 05:12Recent activity 2026-06-08 05:26Estimated read 7 min

Section 01

Introduction: Stellar LLM Classifier—A Star Classification Tool Fusing Astrophysics and Large Language Models

Stellar LLM Classifier is an innovative intelligent star classification tool. Its core lies in adopting a hybrid architecture that combines traditional astrophysical deterministic computing (hard computing) with the AstroSage-8B large language model to achieve automatic classification of Gaia DR3 star spectral types and generation of natural language descriptions. The tool supports local operation to ensure data privacy and security, aiming to allow astronomy enthusiasts and researchers without programming backgrounds to easily use advanced classification technology.

Section 02

Project Background and Source

Original author/maintainer: bennylimpid196
Source platform: GitHub
Original title: stellar-llm-classifier
Release date: June 3, 2026
Last update: June 7, 2026
Project purpose: To enable astronomy enthusiasts and researchers without programming backgrounds to easily use advanced star classification technology to process the Gaia DR3 dataset.

Section 03

Core Technology and Method Architecture

Hybrid Computing Paradigm

Hard Computing Layer: Based on astrophysical rules (such as absolute magnitude, effective temperature, surface gravity), it performs standardized classification according to the Morgan-Keenan (MK) spectral classification system, ensuring physical interpretability and rigor. Soft Computing Layer: Calls the fine-tuned AstroSage-8B large language model (8 billion parameters, trained on astronomical literature) to convert technical classification data into professional natural language descriptions.

Data Processing Flow

Data Import: Upload a CSV file containing Gaia DR3 data
Parameter Validation: Check required fields such as absolute magnitude, effective temperature, and surface gravity
Rule-based Classification: The hard computing layer determines the MK spectral type
Intelligent Description: The soft computing layer generates natural language descriptions
Result Export: Output a report containing spectral types and descriptions

Section 04

Model Performance Verification Results

Core Metrics for Version V6 (Test Set: 498 Gaia DR3 Stars)

Metric	Value
Accuracy	0.7579
Cohen's Kappa Coefficient	0.7083
Macro-average F1 Score	0.6710
Near Misses (Distance 1)	0.9976
Mean Absolute Error (ΔTeff)	248.0 K

Confidence Interval and Version Evolution

Bootstrap 95% confidence interval error range: [0.135, 0.205], with good stability
Version Iteration: From V1 to V7, system prompts and verification strategies were optimized; V7's accuracy increased to 0.7951, and the mean absolute error decreased to 212.2 K

Section 05

Application Scenarios and Usage Guide

Target Users

Astronomy Enthusiasts: Analyze star data without programming knowledge
Educators: Demonstrate star classification concepts in teaching
Researchers: Batch process the Gaia DR3 dataset
Data Scientists: Explore the combination of astronomical data and natural language generation

Usage Steps

Download the installer (.exe) from GitHub Releases
Run the installation wizard to complete the installation
Launch the application and import a CSV file in the correct format
Select "Classify stars" to start classification
Export the results to a spreadsheet

System Requirements

OS: Windows 10/11
Processor: Intel Core i5/AMD Ryzen 5 (4 cores or more)
Memory: 8GB (16GB recommended)
Storage: 5GB of available space
Graphics Card: Discrete graphics card optional (improves performance)

Section 06

Project Significance and Technical Insights

Scientific Value

Represents an important attempt in the field of astronomical data processing: integrating traditional deterministic algorithms with generative AI, which not only retains the rigor of physical rules but also leverages the expressive power of LLMs, providing new ideas for scientific data visualization and dissemination.

Technical Insights

Deterministic + Generative: Balances scientific accuracy and user-friendly output
Local Processing: Protects sensitive data and supports offline work
Domain Fine-tuning: General LLMs can significantly improve performance in specific domains after fine-tuning with professional corpora

Section 07

Limitations and Improvement Suggestions

Current Limitations

Primarily optimized for Gaia DR3 data; results may be inconsistent when using other data sources

Improvement Directions

Expand support for more data sources
Further optimize the model's classification accuracy for edge spectral types