Zing Forum

Reading

Survey of Large Language Model Integration Technologies: A Systematic Study on Harnessing Multiple LLMs

A survey paper accepted by IJCAI Survey 2026 that systematically reviews the research progress in the LLM Ensemble field, proposes a three-stage classification framework (pre-inference, in-inference, post-inference), and compiles relevant papers, benchmark tests, and application cases.

LLM Ensemblesurveymulti-modelroutingmodel fusionIJCAIAI
Published 2026-05-11 01:24Recent activity 2026-05-11 01:30Estimated read 6 min
Survey of Large Language Model Integration Technologies: A Systematic Study on Harnessing Multiple LLMs
1

Section 01

[Introduction] Survey of LLM Integration Technologies: A Systematic Study Accepted by IJCAI Survey 2026

This survey paper accepted by IJCAI Survey 2026 systematically reviews the research progress in the LLM Ensemble (Large Language Model Integration) field, proposes a three-stage classification framework (pre-inference, in-inference, post-inference), and compiles relevant papers, benchmark tests, application cases, and supporting resource libraries, providing researchers and practitioners with a systematic knowledge framework and reference materials.

2

Section 02

Research Background and Motivation

Currently, there are dozens of large language models with different architectures, training data, and capability characteristics in the market; some excel at code generation, reasoning, or multilingual processing. The traditional approach of choosing a single model has limitations—different queries perform significantly differently across models. The core of LLM Ensemble is dynamic selection/combination of multiple models, similar to ensemble learning but needing to consider practical constraints such as latency, cost, and model availability.

3

Section 03

Detailed Explanation of the Three-Stage Classification Framework

The paper proposes a three-stage classification framework:

  1. Pre-inference Integration: The core is the routing mechanism, which assigns models based on query characteristics. It includes discrete utility methods (capability label classification) and continuous utility methods (performance scores/response length). The challenge is predicting performance for unseen queries.
  2. In-inference Integration: Fine-grained fusion, including token-level (integrating output distributions), span-level (segment fusion), and process-level (intervening in the reasoning phase). Collaboration is deep but complexity is high.
  3. Post-inference Integration: Fusion after multiple models generate complete responses, with non-cascading (voting, ranking, summarization) and cascading (lightweight models first, calling strong models if necessary) strategies.
4

Section 04

Key Technologies and Methods

Key Technologies:

  • Pre-inference: Training router models to predict optimal models, multi-armed bandit online learning, dynamic routing using model confidence/self-assessment.
  • In-inference: Aligning vocabularies and probability distributions (forced decoding, logits interpolation weighting), collaborative reasoning via alternating generation between models.
  • Post-inference: Simple methods (majority voting, ROUGE/BERTScore selection), training evaluation/summarization models for aggregation, designing thresholds for cascading strategies to balance cost and performance.
5

Section 05

Benchmark Tests and Practical Applications

Benchmark Tests: Cover tasks such as question answering, code generation, mathematical reasoning, and instruction following, evaluating final performance and efficiency metrics (average number of models called, latency, API cost). Application Scenarios: Code generation improves the pass rate of complex tasks; question answering balances accuracy and speed; creative writing produces diverse high-quality outputs.

6

Section 06

Supporting Resources and Community Contributions

The authors maintain the Awesome-LLM-Ensemble repository on GitHub, which organizes relevant paper lists by category, includes public implementation code, commits to updating new papers, and encourages the community to contribute missing/newly published papers via Pull Request or email.

7

Section 07

Future Directions and Summary

Future Directions: Adaptive strategies for dynamic integration, online learning mechanisms, deep fusion of heterogeneous models, optimal trade-off between efficiency and performance. Summary: LLM Ensemble is an important trend in AI's evolution from single models to multi-model collaboration. The survey and resources provide support for the field and will play a key role in building more intelligent and reliable AI systems.