Zing Forum

Reading

Panoramic View of Large and Small Language Model Architectures: A Systematic Literature Review Reveals New Trends in AI System Design

A systematic literature review study deeply analyzes the application status of large language models (LLMs) and small language models (SLMs) in hybrid architectures, multi-agent systems, and monolithic architectures, providing comprehensive academic references for AI system architecture design.

大语言模型小语言模型系统架构多智能体混合架构文献综述AI工程
Published 2026-04-03 09:30Recent activity 2026-04-03 09:51Estimated read 8 min
Panoramic View of Large and Small Language Model Architectures: A Systematic Literature Review Reveals New Trends in AI System Design
1

Section 01

[Introduction] Core Summary of the Panoramic Review on Large and Small Language Model Architectures

This article is a systematic literature review focusing on the application status of large language models (LLMs) and small language models (SLMs) in three major paradigms: hybrid architectures, multi-agent systems, and monolithic architectures. It aims to provide comprehensive academic references for AI system architecture design. The review analyzes key dimensions such as performance, applicable scenarios, and engineering complexity of different architectures, and discusses the collaboration strategies between LLMs and SLMs, research gaps, and practical implications.

2

Section 02

Background: Key Choices in AI Architecture Design

The explosive development of LLMs has brought profound choices in architecture design: Should we concentrate resources to build a super monolithic model? Adopt a distributed solution with multiple small models collaborating? Or find a hybrid balance? Meanwhile, SLMs have emerged due to model compression and knowledge distillation technologies, offering better cost-effectiveness in some scenarios. How to choose or combine LLMs and SLMs has become a core issue for architects.

3

Section 03

Research Methodology: Rigorous Systematic Literature Review Process

This study follows the Systematic Literature Review (SLR) methodology: 1. Define clear retrieval strategies (keywords, databases, time range); 2. Formulate screening criteria (inclusion/exclusion conditions to ensure literature quality); 3. Structured data extraction (extract research questions, methods, results, etc. from literature); 4. Quality assessment (identify high-impact studies and methodological flaws). The research is organized in LaTeX format, complying with academic standards.

4

Section 04

Core Findings: Comparative Analysis of Three Architecture Paradigms

The review's comparison of hybrid architectures, multi-agent systems, and monolithic architectures shows:

  • Performance and Efficiency: Hybrid/multi-agent architectures approach or surpass monolithic LLMs in specific tasks while reducing computational costs; however, monolithic LLMs still dominate complex deep reasoning tasks.
  • Applicable Scenarios: Monolithic architectures are suitable for general dialogue and creative writing; multi-agent systems are suitable for complex workflow collaboration; hybrid architectures are attractive in cost-sensitive commercial applications.
  • Engineering Complexity: Monolithic architectures are the simplest, while multi-agent systems require additional costs for coordination, communication, etc.
  • Interpretability and Controllability: Multi-agent/hybrid architectures have better interpretability and controllability due to task decomposition, making it easier to update and replace components.
5

Section 05

Collaboration Strategies Between LLMs and SLMs

The review focuses on collaboration modes between LLMs and SLMs:

  • Routing Mode: A lightweight model judges the complexity of queries; simple tasks are handled by SLMs, and complex tasks are transferred to LLMs.
  • Cascading Mode: SLMs generate preliminary results, which are then refined or verified by LLMs.
  • Mixture of Experts Mode: SLMs act as domain experts and are called by LLMs or routers.
  • Distillation and Fine-tuning: LLMs serve as teacher models to train dedicated SLMs through knowledge distillation.
6

Section 06

Research Gaps and Future Directions

The review points out the following research gaps:

  1. Lack of Standardized Evaluation: Cross-study benchmarks and metrics are not unified, making comparisons difficult.
  2. Insufficient Long-term Stability: There is a lack of research on stability, drift, and degradation of architectures during long-term operation.
  3. Safety and Alignment Challenges: Safety alignment for multi-component systems is more complex, and the risks of component interactions need to be explored.
  4. Inadequate Economic Analysis: There is insufficient analysis of economic dimensions such as Total Cost of Ownership (TCO) and Return on Investment (ROI).
7

Section 07

Implications for Practitioners

Implications for AI architects and engineers:

  • Avoid Blindly Pursuing Larger Models: Model size is not the only criterion; it is necessary to consider task characteristics, performance requirements, and cost constraints.
  • Consider Evolution Path: Architecture design should reserve space for future upgrades and expansions.
  • Pay Attention to Operation and Maintenance Complexity: Multi-agent/hybrid architectures are more difficult to monitor and debug in production environments.
  • Establish Evaluation Systems: Need to establish evaluation systems aligned with business goals, focusing on actual business indicators rather than just benchmark scores.