Zing Forum

Reading

Multilingual Large Model for Railway Vocational Education: An Intelligent Teaching System for International Students

A knowledge-enhanced large language model designed specifically for international vocational education in railway engineering, supporting Chinese, English, and Malay trilingual Q&A, professional tutoring, and intelligent teaching applications.

大语言模型铁道工程职业教育多语言RAGQLoRA知识库留学生教育中英双语
Published 2026-06-11 13:13Recent activity 2026-06-11 13:21Estimated read 8 min
Multilingual Large Model for Railway Vocational Education: An Intelligent Teaching System for International Students
1

Section 01

Introduction to the Multilingual Railway Vocational Education Large Model Project

Project Basic Information

Core Insights

This project is a knowledge-enhanced large language model system designed specifically for international vocational education in railway engineering. It supports trilingual Q&A and professional tutoring in Chinese, English, and Malay. By integrating technologies like Retrieval-Augmented Generation (RAG) and QLoRA, it addresses language barriers for overseas students and shortcomings of traditional teaching models, providing accurate and traceable intelligent teaching content to support railway talent development along the "Belt and Road" initiative.

2

Section 02

Project Background and Significance

With the in-depth advancement of the "Belt and Road" initiative, Chinese railway technology is accelerating its entry into the international market. More and more overseas students are coming to China to study railway engineering technology, but language barriers and the complexity of professional terminology pose significant challenges to teaching. Traditional teaching models struggle to meet cross-language and cross-cultural needs, creating an urgent demand for intelligent teaching aids.

This project addresses this pain point by developing a multilingual railway knowledge teaching large model system, enabling real-time trilingual Q&A in Chinese, English, and Malay, and providing accurate and traceable teaching content based on a professional knowledge base, offering a new solution for international vocational education.

3

Section 03

System Architecture and Technical Implementation

Data Processing Pipeline

  1. DOCX Parsing: Read document paragraphs and tables while preserving metadata
  2. Text Cleaning: Remove control characters, page numbers, etc., and unify punctuation formats
  3. Clause Segmentation: Split by chapter number and 900-character length, with 120-character overlap retained
  4. Term Extraction: Extract Chinese-English term pairs from tables and parallel texts
  5. Bilingual Alignment: Process Chinese-English parallel content in the same line or adjacent paragraphs
  6. Instruction Sample Construction: Generate training samples for term translation, regulation interpretation, etc.

RAG Knowledge Retrieval Pipeline

  • Embedding Model: BAAI/bge-m3 (adapted for Chinese-English mixed text)
  • Vector Storage: FAISS IndexFlatIP (inner product approximates cosine similarity)
  • Retrieval Logic: Return top-5 relevant segments and mark citation sources during generation

Model Selection and Training Configuration

  • Base Models: Qwen/Qwen2.5-3B-Instruct (stable VRAM usage), Qwen/Qwen2.5-7B-Instruct (better performance)
  • Single RTX3090 Card Configuration: 4-bit NF4 quantization, LoRA rank 16, batch size 1, sequence length 2048, etc.
4

Section 04

Evaluation System and Application Scenarios

Evaluation System

  • Objective Questions: Accuracy of term selection/regulation judgment/term matching
  • Subjective Questions: ROUGE-L/BLEU scores, ratings from teachers (accuracy/completeness, etc.) and students (understandability, etc.)
  • Teaching Usability: Logical organization of answers, language adapted to students' proficiency levels, bilingual term output
  • Credibility: Citation coverage, traceable conclusions, refusal to generate fabricated or dangerous content

Application Scenarios

  1. Classroom Teaching Aid: Query term comparisons, regulation interpretations
  2. Self-Learning Support: Overseas students obtain professional answers with citations
  3. Question Bank Construction: Automatically generate single-choice questions on regulations
  4. Translation Proofreading: Use the term bank to ensure the accuracy of technical documents
5

Section 05

Project Features and Innovations

  1. Domain Specificity: Deeply optimized for the vertical field of railway engineering
  2. Multilingual Support: Chinese, English, and Malay coverage to meet the needs of railway talent development in Southeast Asia
  3. Traceable Knowledge: Answers are annotated with sources, meeting educational authority requirements
  4. Hardware-Friendly: Optimized for single-card 24GB VRAM, lowering deployment barriers
  5. Complete Toolchain: Covers the entire process from data processing and model training to service deployment
6

Section 06

Summary and Outlook

This project demonstrates the potential of large models in vertical domain education applications. Through a professional knowledge base, refined data processing, and RAG technology, it addresses language barriers and ensures content accuracy.

In the future, we will accumulate more bilingual teaching materials, enhance model capabilities, and provide stronger technical support for railway talent development in countries along the "Belt and Road" initiative.