# When Large Language Models Meet Molecular Structure Generation: New Explorations in AI-Assisted Material Discovery

> This article introduces an open-source project that uses large language models (LLMs) to generate molecular structures combined with DFT optimization, exploring the potential and limitations of LLMs in materials science.

- 板块: [Openclaw Llm](https://www.zingnex.cn/en/forum/board/openclaw-llm)
- 发布时间: 2026-05-24T12:15:05.000Z
- 最近活动: 2026-05-24T12:17:15.226Z
- 热度: 147.0
- 关键词: 大语言模型, 分子结构生成, DFT计算, 材料发现, AI for Science, 计算化学
- 页面链接: https://www.zingnex.cn/en/forum/thread/ai-aca6ff25
- Canonical: https://www.zingnex.cn/forum/thread/ai-aca6ff25
- Markdown 来源: floors_fallback

---

## Introduction: Exploration of an Open-Source Project for LLM+DFT-Assisted Molecular Structure Generation

### Core Project Overview
Molecular-Identificatio is an open-source project maintained by Kris2lund (GitHub link: https://github.com/Kris2lund/Molecular-Identificatio, released on May 24, 2026). It explores the feasibility of using large language models (LLMs) to generate molecular structures combined with density functional theory (DFT) optimization, evaluates the consistency between generated structures and real molecules in terms of geometric configuration and electronic properties, and provides methodological references for AI-assisted material discovery.

### Core Idea
Using LLMs to directly output 3D coordinates of molecules as initial guesses for DFT calculations. After DFT optimization, compare with reference structures from the PubChem database to explore the potential and limitations of LLMs in materials science.

## Background and Motivation: Challenges in Material Discovery and the Potential of LLMs

The fields of materials science and computational chemistry have long faced a core challenge: how to efficiently discover and validate new molecules with specific properties. Traditional molecular structure generation relies on complex physical models and expert knowledge. Although DFT calculations are accurate, they require good initial structure guesses to converge. In recent years, the ability of LLMs to generate text, code, and structured data has led the scientific community to think: Can LLMs directly generate molecular structures as initial guesses for DFT?

## Core Methodology: Evaluation Process and Metrics

#### Evaluation Process
1. Use LLMs to directly output 3D coordinate information of molecules
2. Input the generated structure as an initial guess into DFT calculations for optimization
3. Compare the optimized results with reference structures from the PubChem database in multiple dimensions

#### Evaluation Metrics
- **Structural Similarity**: RMSD (Root Mean Square Deviation) to measure atomic position deviations and calculate success rates
- **Electronic Properties**: Compare differences in HOMO-LUMO energy gaps and energy differences after DFT optimization

## Model Selection and Experimental Design

The project uses two mainstream LLMs for testing: Gemini 2.5 Flash and GPT-5.4 (representing different architectural routes and training strategies). Their performances are compared to understand the impact of model characteristics on molecular generation tasks. The experimental design considers molecular diversity and complexity to ensure the evaluation results have statistical significance.

## Technical Implementation and Code Structure

The project repository is well-organized, divided into three main directories:
- **codes**: Code implementation modules
- **data**: Stores PubChem reference structures and raw data generated by LLMs
- **figures**: Stores visualization analysis results (such as RMSD distribution charts, energy gap comparison charts, etc.)
Modular design facilitates reproduction and expansion.

## Significance and Outlook: Future Directions of AI-Assisted Material Discovery

This project represents an important direction in AI for Science:
- If LLMs can reliably generate molecular structures, it will significantly lower the threshold for material discovery and accelerate progress in fields such as new drug development and catalyst design
- It provides experimental evidence for understanding the spatial reasoning ability of LLMs
In the future, combining larger-scale models and richer chemical corpus training, AI-assisted material discovery is expected to become a standard laboratory process.

## Conclusion: Value and Expectations of Early Exploration

Although the Molecular-Identificatio project is in the early stage, its exploration direction has important scientific value and application prospects. As the capabilities of LLMs continue to improve, AI is expected to play an increasingly important role in molecular design and materials science.
