Zing Forum

Reading

Interpretable Large Language Model Classifier: Automatic Classification System for MTSK Mathematics Teaching Research Papers

This article introduces an interpretable classifier project based on large language models, specifically designed to automatically classify research papers in the field of Mathematical Teaching Specialized Knowledge (MTSK) into five thematic categories, and provides word-level attribution explanations using SHAP technology.

大语言模型文本分类可解释AISHAP数学教育MTSK框架多语言模型教育技术文献分类机器学习
Published 2026-05-13 06:22Recent activity 2026-05-13 06:32Estimated read 4 min
Interpretable Large Language Model Classifier: Automatic Classification System for MTSK Mathematics Teaching Research Papers
1

Section 01

[Introduction] Core Overview of the MTSK Mathematics Teaching Research Paper Automatic Classification System

This article introduces the open-source project mtsk-classifier, which aims to solve the problem of automatic classification of research papers in the MTSK field. The system combines a multilingual large language model (intfloat/multilingual-e5-large) with SHAP interpretability technology to classify papers into 5 thematic categories, with good performance and open-source resources such as model weights.

2

Section 02

[Background] Challenges in Classifying MTSK Research Papers

The MTSK framework is an important theory in mathematics education, and the number of related papers is growing rapidly. Manual classification is time-consuming and labor-intensive, and general text classification tools lack domain specificity, which led to the creation of this project.

3

Section 03

[Methodology] Technical Architecture and Interpretability Design

  1. Core model:选用intfloat/multilingual-e5-large multilingual embedding model, add dropout layer + linear classification head;
  2. Classification labels: T1 (Initial Teacher Training), T2 (Teacher Educator Training), T3 (MTSK for Specific Mathematical Topics), T4 (MTSK Development), T5 (MTSK Framework Expansion);
  3. Interpretability:采用SHAP技术提供词级归因解释, quantifying the contribution of vocabulary to classification decisions.
4

Section 04

[Evidence] Experimental Performance and Dataset Details

  1. Experimental design: Three independent runs with fixed seeds, early stopping mechanism (patience=3), AdamW optimizer (learning rate 5e-5);
  2. Performance metrics: Macro-average F1 score of 0.7776, validation accuracy of 0.7966;
  3. Resources: The dataset contains 293 papers (request required for access), the model is published on Hugging Face (crojasce1/mtsk-classifier), and a Colab experiment notebook is provided.
5

Section 05

[Conclusion] Academic Value and Application Prospects of the Project

  1. Academic contribution: Provides an NLP application example for the field of educational technology; the interpretability design helps with the responsible application of AI;
  2. Community value: Accelerates MTSK literature reviews, discovers research trends, and identifies gaps;
  3. Extensibility: The technical architecture can be migrated to other educational fields or academic classification tasks.
6

Section 06

[Recommendations] Limitations and Future Research Directions

  1. Limitations: Small dataset size, unclear language coverage, strong domain specificity;
  2. Future directions: Expand the dataset, explore advanced models, develop transfer learning methods, integrate into academic platforms.