Reading

The Old vs New Debate in Named Entity Recognition: A Comprehensive Comparative Study of Encoder Models and Generative Large Language Models

A bachelor's thesis-level systematic comparative study that deeply compares the performance, efficiency, and robustness differences between traditional encoder architectures (DeBERTa) and LoRA-fine-tuned generative large language models (Qwen3.5) in Named Entity Recognition tasks, providing empirical evidence for model selection in real-world application scenarios.

命名实体识别NERDeBERTaQwenLoRAQLoRA大语言模型编码器对比研究自然语言处理

Published 2026-04-07 08:41Recent activity 2026-04-07 15:14Estimated read 7 min

The Old vs New Debate in Named Entity Recognition: A Comprehensive Comparative Study of Encoder Models and Generative Large Language Models

Section 01

【Introduction】Comparison of Old and New NER Solutions: Encoder Models vs Generative Large Language Models

This is a bachelor's thesis-level systematic comparative study that deeply compares the performance, efficiency, and robustness differences between traditional encoder architectures (DeBERTa) and LoRA/QLoRA-fine-tuned generative large language models (Qwen3.5) in Named Entity Recognition (NER) tasks. It aims to provide empirical evidence for model selection in real-world application scenarios. The study covers the essential differences between the two technical routes, experimental design, key findings, and application recommendations, offering references for practitioners to understand technical trade-offs.

Section 02

Research Background: The Importance of NER Tasks and the Debate Over Technical Routes

NER is a fundamental task in natural language processing, whose core is to identify and classify entities with specific meanings from unstructured text (e.g., "Berlin" → LOC, "OpenAI" → ORG), supporting downstream applications such as information extraction and knowledge graph construction. For a long time, encoder architectures (e.g., BERT, DeBERTa) have been the mainstream solution for NER, but the rise of generative large language models has raised a new question: in resource-constrained environments, should we continue to use mature encoders or switch to flexible generative models?

Section 03

Technical Routes and Experimental Design: Essential Differences Between the Two Solutions and Evaluation Framework

Differences in Technical Routes

Encoder Solution: Treats NER as a token-level classification task, outputting BIO tags (e.g., B-LOC indicates the start of an entity). Its advantages are deterministic output, low inference latency, and controllable memory usage, but it only supports fixed label sets.
Generative Solution: Converts NER into a text generation task, outputting entity lists in JSON format. Its advantage is flexibility (can dynamically specify entity types), but there are risks of parsing failure and hallucinated entities.

Experimental Design

Datasets: MultiNERD (15 entity types), WNUT-17 (6 entity types).
Evaluation Metrics: Task quality (entity-level precision, recall, F1), efficiency (training time, inference latency, memory usage), model scale (trainable parameters/total parameters), robustness (JSON parsing failure rate of generative models).
Model Configurations: Encoder uses DeBERTa-v3-base/large; generative models use Qwen3.5-4B/27B (fine-tuned with QLoRA, 4-bit quantization, LoRA rank 16/32).

Technical Implementation

A modular architecture is adopted: the data layer handles different input formats, the training layer uses Trainer (for encoders) and SFTTrainer (for generative models), the inference layer outputs structured results, and the evaluation layer calculates multi-dimensional metrics.

Section 04

Key Findings: Trade-offs Between Performance, Efficiency, and Robustness

Performance-Efficiency Trade-off: Encoders excel in inference speed and output determinism (no parsing step, zero failure rate); generative models are flexible but add parsing overhead and failure risks.
Value of Parameter-Efficient Fine-Tuning: LoRA/QLoRA technologies significantly reduce memory requirements, enabling fine-tuning of 27B models on consumer GPUs, balancing performance and resource consumption.
Selection of Validation Metrics: Generative NER should use entity recognition F1 instead of language model loss (perplexity) as the validation criterion to avoid target misalignment.

Section 05

Application Recommendations and Conclusion: Choose the Right Solution Based on Scenarios

Application Scenario Recommendations

High-throughput real-time systems (e.g., search engines): Prioritize encoder solutions (low latency, deterministic output).
Flexible extraction needs (custom entity types): Choose generative solutions, but need to handle parsing robustness.
Resource-constrained environments: QLoRA-fine-tuned small-to-medium LLMs (e.g., 4B parameters) are a compromise choice.

Conclusion

This study provides quantitative comparison data and a reproducible framework for the two solutions. Future technical boundaries may evolve, but practitioners should choose solutions based on scenarios rather than blindly following trends.

Continue Reading

Keep going with more reads from the same topic.

Nornir MCP Server: An Enterprise-Grade Bridge for Integrating Large Language Models into Network Automation

Nornir MCP Server is an enterprise-level server based on the Model Context Protocol (MCP). It seamlessly integrates large language models (such as Claude) with the Nornir network automation framework, supporting natural language orchestration for multi-vendor network devices (Cisco, Arista, Juniper, etc.), and providing production-grade features like a dual-engine architecture (NAPALM + Netmiko), intelligent filtering, and a secure sandbox.

Recent activity 2026-05-06 20:51

Bibliothèque Française LLM: A French Public Domain Literature Index System Optimized for Large Language Models

Bibliothèque Française LLM is a structured indexing and annotation project for French public domain literature designed specifically for large language models (LLMs). It integrates multiple authoritative sources such as DraCor, Common Corpus, and Wikisource, providing metadata indexing categorized by genre, author, and era, as well as in-depth annotations for dramatic texts (including characters, lines, stage directions, etc.). Its aim is to enable LLMs to efficiently read and understand classic French literary works.

Recent activity 2026-05-06 20:50

Splinter: A Lock-Free Zero-Copy Shared Memory KV and Vector Storage Library That Eliminates Socket and Memcpy Overhead for LLM Inference

Splinter is a minimalist, high-performance key-value (KV) and vector storage system enabling zero-latency inter-process communication via shared memory and atomic operations. With only 766 lines of core code, it supports millions of operations per second and 768-dimensional vector storage, offering a new architectural approach for local LLM inference and data-intensive applications.

Recent activity 2026-04-03 08:49

Folkering OS: When the Operating System Itself Is AI—A Self-Evolving Bare-Metal Rust System

Folkering OS is the world's first AI-native bare-metal operating system, entirely written in Rust no_std without relying on Linux, POSIX, or libc. It can generate commands from scratch, compile them into WASM, and run them in 10 seconds, achieving true self-evolution.

Recent activity 2026-04-09 16:15