Reading

Commonsense-Driven Transformer Fine-Tuning: Enabling LLMs to Generate More Coherent Stories

An NLP and generative AI system that uses LoRA technology to fine-tune three large language models, integrates commonsense reasoning capabilities for short story generation, and is trained on the ROCStories dataset with evaluation metrics including BLEU, ROUGE, BERTScore, and perplexity.

大语言模型LoRA微调常识推理故事生成Transformer生成式AINLPROCStories

Published 2026-06-13 06:41Recent activity 2026-06-13 06:57Estimated read 10 min

Commonsense-Driven Transformer Fine-Tuning: Enabling LLMs to Generate More Coherent Stories

Section 01

Introduction: Commonsense-Driven Transformer Fine-Tuning Improves Story Generation Coherence

Original Author/Maintainer: nithin-jella Source Platform: GitHub Original Title: Commonsense-Driven-Fine-Tuning-of-Transformer-Models-for-Coherent-Story-Generation Original Link: https://github.com/nithin-jella/Commonsense-Driven-Fine-Tuning-of-Transformer-Models-for-Coherent-Story-Generation Publication Time: 2026-06-12

This project addresses issues like logical breaks and commonsense violations when large language models generate long stories. It proposes fine-tuning three large language models of different architectures using LoRA technology, integrating commonsense reasoning capabilities, training on the ROCStories dataset, and evaluating with metrics such as BLEU, ROUGE, BERTScore, and perplexity, aiming to generate more coherent and reasonable short stories.

Section 02

Research Background and Motivation

Large language models excel in text generation, but when generating long coherent stories, they often have issues like logical breaks, unreasonable plots, and character behaviors that violate commonsense. The root cause is that models mainly learn surface statistical patterns of text and lack deep causal logic and commonsense knowledge. For example, a model might generate sentences like "Xiao Ming put ice cubes into hot tea, and the ice cubes became larger" which violates physical commonsense. This project aims to solve these problems and improve the coherence and rationality of story generation by injecting commonsense reasoning capabilities.

Section 03

Technical Solution: LoRA Fine-Tuning and Commonsense Integration

Model Selection and Fine-Tuning Strategy

Three representative large language models (different architectures and scales) are selected for comparison to verify the generality of the method. LoRA (Low-Rank Adaptation) is used for parameter-efficient fine-tuning, with advantages including high computational efficiency (only training a small number of low-rank matrices), low storage cost, low inference overhead, and avoiding catastrophic forgetting.

Commonsense Reasoning Integration

Source of Commonsense Knowledge: Use existing commonsense knowledge bases and reasoning datasets to provide prior knowledge such as physics, social norms, and causal relationships.
Training Data: Build training samples based on the ROCStories dataset (five-sentence short stories, manually verified to conform to commonsense).
Loss Function: On top of the standard language modeling loss, an auxiliary loss for commonsense consistency may be introduced to achieve multi-objective optimization.

Section 04

Evaluation System: Multi-Dimensional Measurement of Generation Quality

Automatic Evaluation Metrics

BLEU: Measures the n-gram overlap between generated text and reference text, reflecting lexical similarity.
ROUGE: Focuses on recall; ROUGE-L captures text fluency and structural similarity.
BERTScore: Semantic similarity based on pre-trained model embeddings, close to human judgment.
Perplexity: Reflects the model's confidence in generated content; lower perplexity means better fluency and grammatical correctness.

Commonsense Consistency Evaluation

Human Evaluation: Human judges assess logical rationality and commonsense compliance.
Adversarial Testing: Design test cases to check if commonsense-violating content is avoided.
Comparative Experiments: Compare with baseline models without commonsense enhancement.

Section 05

Experimental Results and Key Findings

Although there are no specific values, the following can be inferred:

Commonsense Enhancement is Effective: After fine-tuning, the model maintains language fluency and significantly improves logical consistency.
LoRA Applicability: Verify the effectiveness of LoRA in commonsense reasoning tasks, lowering the threshold for experiments.
Multi-Model Comparison: Analyze the relationship between model architecture, scale, and commonsense reasoning ability, providing references for future optimization.

Section 06

Application Scenarios and Potential Value

Creative Writing Assistance: Provide AI assistants for authors to generate logically reasonable story frameworks, plot twists, etc.
Educational Content Generation: Automatically generate educational stories that conform to scientific commonsense, supporting large-scale production of personalized learning materials.
Dialogue System Enhancement: Improve the long-text generation ability of chatbots and maintain logical consistency.
Game Narrative Design: Generate dynamic plots for open-world games, ensuring consistency in NPC behaviors and physical rules.

Section 07

Limitations and Future Directions

Commonsense Coverage: Current knowledge bases mainly cover physics and social norms; professional domain knowledge is limited, so breadth and depth need to be expanded.
Cultural Differences: Commonsense is culturally relative, so adaptation to multilingual and multicultural scenarios is needed.
Computational Efficiency: Inference still requires high resources; practicality can be improved through model compression, quantization, etc.
Evaluation Challenges: There is a gap between automatic metrics and human judgment; better automatic evaluation methods for commonsense consistency need to be developed.

Section 08

Summary and Insights

This project represents an important attempt in the NLG field to evolve toward a more intelligent and rational direction, emphasizing that language models need to generate content that conforms to real-world logic. Its technical route (parameter-efficient fine-tuning with commonsense injection) provides a feasible path for researchers with limited resources. Insights for developers: Large model applications need to be optimized for specific needs (such as commonsense consistency), and the "general foundation + specialized enhancement" may become the mainstream paradigm. In the future, multimodal and world model technologies are expected to further improve commonsense reasoning capabilities and enable more intelligent AI story generation.