Zing Forum

Reading

Stellar LLM Classifier: An Intelligent Star Classification System Combining Astrophysical Rules and Large Language Models

An innovative hybrid architecture astronomical tool that combines deterministic hard computing with the AstroSage-8B large language model to enable automatic classification of Gaia DR3 star spectral types and generation of natural language descriptions.

恒星分类大语言模型天体物理Gaia DR3AstroSage-8B混合计算光谱类型机器学习天文AI自然语言生成
Published 2026-06-08 05:12Recent activity 2026-06-08 05:26Estimated read 7 min
Stellar LLM Classifier: An Intelligent Star Classification System Combining Astrophysical Rules and Large Language Models
1

Section 01

Introduction: Stellar LLM Classifier—A Star Classification Tool Fusing Astrophysics and Large Language Models

Stellar LLM Classifier is an innovative intelligent star classification tool. Its core lies in adopting a hybrid architecture that combines traditional astrophysical deterministic computing (hard computing) with the AstroSage-8B large language model to achieve automatic classification of Gaia DR3 star spectral types and generation of natural language descriptions. The tool supports local operation to ensure data privacy and security, aiming to allow astronomy enthusiasts and researchers without programming backgrounds to easily use advanced classification technology.

2

Section 02

Project Background and Source

  • Original author/maintainer: bennylimpid196
  • Source platform: GitHub
  • Original title: stellar-llm-classifier
  • Release date: June 3, 2026
  • Last update: June 7, 2026
  • Project purpose: To enable astronomy enthusiasts and researchers without programming backgrounds to easily use advanced star classification technology to process the Gaia DR3 dataset.
3

Section 03

Core Technology and Method Architecture

Hybrid Computing Paradigm

Hard Computing Layer: Based on astrophysical rules (such as absolute magnitude, effective temperature, surface gravity), it performs standardized classification according to the Morgan-Keenan (MK) spectral classification system, ensuring physical interpretability and rigor. Soft Computing Layer: Calls the fine-tuned AstroSage-8B large language model (8 billion parameters, trained on astronomical literature) to convert technical classification data into professional natural language descriptions.

Data Processing Flow

  1. Data Import: Upload a CSV file containing Gaia DR3 data
  2. Parameter Validation: Check required fields such as absolute magnitude, effective temperature, and surface gravity
  3. Rule-based Classification: The hard computing layer determines the MK spectral type
  4. Intelligent Description: The soft computing layer generates natural language descriptions
  5. Result Export: Output a report containing spectral types and descriptions
4

Section 04

Model Performance Verification Results

Core Metrics for Version V6 (Test Set: 498 Gaia DR3 Stars)

Metric Value
Accuracy 0.7579
Cohen's Kappa Coefficient 0.7083
Macro-average F1 Score 0.6710
Near Misses (Distance 1) 0.9976
Mean Absolute Error (ΔTeff) 248.0 K

Confidence Interval and Version Evolution

  • Bootstrap 95% confidence interval error range: [0.135, 0.205], with good stability
  • Version Iteration: From V1 to V7, system prompts and verification strategies were optimized; V7's accuracy increased to 0.7951, and the mean absolute error decreased to 212.2 K
5

Section 05

Application Scenarios and Usage Guide

Target Users

  • Astronomy Enthusiasts: Analyze star data without programming knowledge
  • Educators: Demonstrate star classification concepts in teaching
  • Researchers: Batch process the Gaia DR3 dataset
  • Data Scientists: Explore the combination of astronomical data and natural language generation

Usage Steps

  1. Download the installer (.exe) from GitHub Releases
  2. Run the installation wizard to complete the installation
  3. Launch the application and import a CSV file in the correct format
  4. Select "Classify stars" to start classification
  5. Export the results to a spreadsheet

System Requirements

  • OS: Windows 10/11
  • Processor: Intel Core i5/AMD Ryzen 5 (4 cores or more)
  • Memory: 8GB (16GB recommended)
  • Storage: 5GB of available space
  • Graphics Card: Discrete graphics card optional (improves performance)
6

Section 06

Project Significance and Technical Insights

Scientific Value

Represents an important attempt in the field of astronomical data processing: integrating traditional deterministic algorithms with generative AI, which not only retains the rigor of physical rules but also leverages the expressive power of LLMs, providing new ideas for scientific data visualization and dissemination.

Technical Insights

  • Deterministic + Generative: Balances scientific accuracy and user-friendly output
  • Local Processing: Protects sensitive data and supports offline work
  • Domain Fine-tuning: General LLMs can significantly improve performance in specific domains after fine-tuning with professional corpora
7

Section 07

Limitations and Improvement Suggestions

Current Limitations

  • Primarily optimized for Gaia DR3 data; results may be inconsistent when using other data sources

Improvement Directions

  • Expand support for more data sources
  • Further optimize the model's classification accuracy for edge spectral types