Zing Forum

Reading

Interpreting History with AI: How Large Language Models Classify 19th-Century Swedish Patent Documents

A research project combining the KB-BERT model and generative large language models has successfully automated the classification of 19th-century Swedish historical patents, demonstrating the potential of AI in the digitization and knowledge mining of historical documents.

大语言模型历史文献专利分类BERT数字人文瑞典语NLP文本分类KB-BERT预训练模型
Published 2026-05-27 16:44Recent activity 2026-05-27 16:51Estimated read 7 min
Interpreting History with AI: How Large Language Models Classify 19th-Century Swedish Patent Documents
1

Section 01

Introduction: Core Achievements and Significance of AI Classification for 19th-Century Swedish Patent Documents

Core Insights

A research project combining the KB-BERT model and generative large language models has successfully automated the classification of 19th-century Swedish historical patents, demonstrating the potential of AI in the digitization and knowledge mining of historical documents.

Basic Information

  • Original Author/Maintainer: yuntingxie
  • Source Platform: GitHub
  • Original Title: patent_classification
  • Original Link: https://github.com/yuntingxie/patent_classification
  • Publication Date: May 27, 2026
  • Related Paper: "You have no class! Large Language Model Classification of Nineteenth Century Patents in Sweden, 1852-1914"
2

Section 02

Project Background and Research Significance

The digitization and automatic analysis of historical documents are important topics in the field of digital humanities. The Swedish Historical Patent Infrastructure Project preserves a large number of patent documents from 1852 to 1914, recording the trajectory of technological innovation during the Industrial Revolution. However, manual classification is time-consuming, labor-intensive, and requires professional knowledge. With the development of large language model technology, this project explores the use of AI to automate the classification of historical documents and verifies its effectiveness.

3

Section 03

Technical Solution and Implementation Methods

Core Models

  1. KB-BERT Fine-tuning Scheme: Based on the KB-BERT model trained by the National Library of Sweden, using patent titles as input, and performing supervised fine-tuning on the DPK classification system.
  2. Generative Large Language Model Scheme: Guiding the generative model to output classification results through prompt engineering.

Data Processing

  • Data Source: Swedish Historical Patent Infrastructure (https://svenskahistoriskapatent.se/) patent documents from 1852 to 1914
  • Classification System: DPK (Det Preliminära Klassifikationssystemet) historical patent classification standard

Technical Details

4

Section 04

Research Results and Academic Value

Key Findings

The fine-tuned KB-BERT model performed excellently in the 19th-century Swedish patent classification task, effectively identifying technical categories and verifying the potential of pre-trained models in historical document processing.

Academic Contributions

  1. Methodological Innovation: Applying modern NLP technology to historical document research
  2. Dataset Construction: Providing reusable technical solutions
  3. Interdisciplinary Integration: Connecting computer science and history

Data Openness

The team commits to releasing the complete dataset along with a data paper to facilitate subsequent research.

5

Section 05

Application Prospects and Implications

Digitization of Historical Documents

It can be extended to large-scale historical document digitization tasks such as ancient book classification, archive organization, and topic modeling of historical newspapers.

Digital Humanities Paradigm

AI technology greatly improves the efficiency of document organization, allowing researchers to focus on in-depth analysis and knowledge discovery.

Low-Resource Language Processing

The success of KB-BERT provides a reference for processing medium-resource languages such as Swedish, and domain-specific fine-tuning can achieve practical results.

6

Section 06

Summary of Technical Highlights

  1. Domain Adaptation: Optimizing the model for the special language style of 19th-century Swedish patents
  2. Multi-Model Comparison: Systematically comparing the performance differences between discriminative (KB-BERT) and generative models
  3. Reproducibility: Complete code and data release plans ensure the reproducibility of the research
  4. Cross-Language Application: Demonstrating the effectiveness of pre-trained models in historical low-resource language processing

Related Links