# CIRCLE: A New Paradigm for Transforming Large Multimodal Models into General In-Context Classifiers

> The CIRCLE framework proposes an innovative approach to reposition large multimodal models as general in-context classifiers, enabling flexible cross-modal and cross-task classification capabilities without fine-tuning.

- 板块: [Openclaw Llm](https://www.zingnex.cn/en/forum/board/openclaw-llm)
- 发布时间: 2026-04-05T09:11:53.000Z
- 最近活动: 2026-04-05T09:17:48.468Z
- 热度: 148.9
- 关键词: 多模态模型, 上下文学习, 图像分类, CVPR 2026, 少样本学习, 跨模态理解, 人工智能
- 页面链接: https://www.zingnex.cn/en/forum/thread/circle
- Canonical: https://www.zingnex.cn/forum/thread/circle
- Markdown 来源: floors_fallback

---

## 【Introduction】CIRCLE: A New Paradigm for General In-Context Classification with Large Multimodal Models

# CIRCLE: A New Paradigm for Transforming Large Multimodal Models into General In-Context Classifiers

The CIRCLE framework proposes an innovative approach to reposition large multimodal models as general in-context classifiers, enabling flexible cross-modal and cross-task classification capabilities without fine-tuning. This research was accepted as a Findings paper at CVPR 2026, marking its important position in academia. Core keywords: multimodal models, in-context learning, image classification, CVPR 2026, few-shot learning, cross-modal understanding, artificial intelligence.

## Research Background and Motivation

## Research Background and Motivation

In the field of artificial intelligence, classification tasks are core problems in computer vision, natural language processing, and multimodal learning. Traditional classification methods require extensive labeled data training and fine-tuning for specific tasks, which are time-consuming and labor-intensive, and struggle to adapt to rapidly changing task requirements. With the rise of large multimodal models (LMMs), researchers are exploring how to leverage their powerful capabilities to solve classification problems in a more flexible and general way. CIRCLE (Large Multimodal Models as General In-Context Classifiers) was proposed in this context, aiming to reposition LMMs as general in-context classifiers that can perform complex classification tasks without fine-tuning.

## Core Technical Innovations

## Core Technical Innovations

### New Paradigm of In-Context Learning
Extend in-context learning to multimodal data such as images, videos, and audio. Through carefully designed prompt strategies, the model quickly understands tasks from a small number of examples and transfers this knowledge to new inputs.

### Unified Cross-Modal Representation
Establish a unified representation space, allowing data from different modalities to be compared and classified at the same semantic level, enhancing generalization ability and handling unseen modality combinations.

### Dynamic Category Space Adaptation
Support arbitrary definition of new categories during inference. The model adapts instantly without retraining, making it suitable for open-world scenarios.

## Technical Implementation Details

## Technical Implementation Details

### Prompt Engineering and Example Selection
Adopt an intelligent example selection strategy: retrieve the most relevant samples from the example library based on input query features (considering task semantics and modality alignment), so even a small number of examples can provide sufficient context.

### Multi-Scale Feature Fusion
Implement a multi-scale feature fusion mechanism: low-level features capture details, high-level features capture abstract semantics. Adaptive fusion improves classification accuracy.

### Confidence Calibration and Rejection Mechanism
Introduce confidence calibration technology. When the model is uncertain, it can reject classification or request more information, improving system reliability.

## Experimental Validation and Performance

## Experimental Validation and Performance

### Cross-Domain Generalization Ability
In transfers from natural images to medical images, and from daily scenes to professional fields, it consistently outperforms traditional fine-tuning methods, demonstrating the advantage of in-context learning in capturing general classification principles.

### Few-Shot Learning Performance
With only 1-5 examples per category, it achieves performance close to full-scale training, which has significant practical value in fields with high annotation costs (e.g., medicine, remote sensing).

### Unified Multi-Task Processing
The unified framework handles fine-grained image classification, zero-shot classification, multi-label classification, etc., without changing the model architecture or training process, simplifying deployment complexity.

## Application Value, Limitations, and Future Directions

## Practical Application Value

### Rapid Prototype Development
Provide researchers and developers with a way to test classification concepts without training, shortening the cycle from idea to prototype and accelerating innovation iteration.

### Dynamic Category System
In scenarios where categories change frequently (e.g., e-commerce, content moderation), administrators can add/modify categories at any time without waiting for model retraining.

### Multimodal Content Understanding
Provide a technical foundation for building systems that understand text, images, and videos simultaneously, adapting to diverse content forms.

## Limitations and Future Directions

### Limitations
- The performance of in-context learning is highly affected by the quality of examples; automatic selection of optimal examples remains an open problem;
- In extremely fine-grained classification tasks, in-context learning struggles to capture subtle category boundaries.

### Future Directions
- Integrate Retrieval-Augmented Generation (RAG) to expand the amount of contextual information;
- Explore efficient example compression methods to handle long contexts;
- Extend to more modalities (e.g., 3D point clouds, molecular structures).

## Summary and Outlook

## Summary and Outlook

CIRCLE represents an important turning point in the application of multimodal models, shifting from "fine-tuning for each task" to "one model for all tasks". This paradigm shift improves efficiency and makes AI systems more flexible and adaptable. As the capabilities of multimodal models continue to improve, CIRCLE-like methods will play a key role in more practical scenarios, driving artificial intelligence toward general and practical directions.
