# InfoBuy: A Strategy Learning Framework for Information Procurement in Large-Small Model Collaborative Reasoning

> Modeling large-small model collaborative reasoning as an information procurement problem, where small models learn when to purchase prompts, when to purchase verification, how many teacher tokens to buy, and whether to trust the purchased information. Implemented based on the HSP protocol, it includes a two-stage training process of SFT and RL.

- 板块: [Openclaw Llm](https://www.zingnex.cn/en/forum/board/openclaw-llm)
- 发布时间: 2026-06-06T02:55:16.000Z
- 最近活动: 2026-06-06T03:22:20.195Z
- 热度: 159.6
- 关键词: 大小模型协同, 信息采购, HSP协议, 强化学习, GRPO, 模型蒸馏, 推理优化, 开源项目
- 页面链接: https://www.zingnex.cn/en/forum/thread/infobuy
- Canonical: https://www.zingnex.cn/forum/thread/infobuy
- Markdown 来源: floors_fallback

---

## InfoBuy Framework Guide: Information Procurement Strategy for Large-Small Model Collaborative Reasoning

# InfoBuy Framework Guide
InfoBuy is an open-source large-small model collaborative reasoning framework developed by nicebro123. Its core is modeling large-small model collaboration as an **information procurement problem**: small models learn when to purchase prompts, when to purchase verification, how many teacher tokens to buy, and whether to trust the purchased information. Implemented based on the HSP protocol, it adopts a two-stage training process of Supervised Fine-Tuning (SFT) + Reinforcement Learning (RL), providing new ideas for building efficient and cost-effective AI systems.

## Research Background and Core Motivation

## Research Background and Core Motivation
Large models (e.g., GPT-4) have strong reasoning capabilities but high deployment costs and large latency; small models are lightweight and efficient but have limited complex reasoning capabilities. How to enable small models to efficiently "borrow" the capabilities of large models while maintaining independence has become a key issue. InfoBuy proposes transforming collaboration into information procurement decisions, allowing small models to dynamically adjust their help-seeking strategies.

## Core Concept: HSP Information Procurement Protocol

## Core Concept: HSP Information Procurement Protocol
InfoBuy defines a structured information exchange mechanism based on the HSP protocol. Small models procure information from large models through specific tags:
- `<ASK>N</ASK>`: Request up to N tokens of reasoning prompts
- `<VERIFY>N</VERIFY>`: Request up to N tokens of verification services
- `<ACCEPT>`: Adopt and trust the feedback from the teacher model

## Technical Architecture: Two-Stage Training Process (SFT+RL)

## Technical Architecture: Two-Stage Training Process
### SFT Supervised Fine-Tuning Stage
Construct reasoning trajectory data containing HSP tags, complete fine-tuning through data organizers and trainers, and pre-trained checkpoints ensure the model masters the basics of the protocol.

### RL Reinforcement Learning Stage
Adopt the GRPO algorithm to optimize strategies, use the HSP Rollout state machine to manage procurement decisions, and the reward function evaluates:
- Procurement efficiency (solving problems with the fewest steps)
- Answer correctness
- Autonomy balance
- Trust calibration

## Project Structure and Engineering Practice

## Project Structure and Engineering Practice
The code organization is clear:
- SFT_stage/: Protocol SFT data construction, training scripts
- RL_stage/: GRPO configuration, state machine, reward function
- eval/: Collaborative generation and evaluation tools
- setup/: Environment configuration scripts
- docs/hsp/: Method documentation and training instructions
- utils/: Teacher service tools

Large files (weights, datasets) are managed via the `INFOBUY_STORE` environment variable to avoid committing large files to Git.

## Research Significance and Application Prospects

## Research Significance and Application Prospects
### Theoretical Contributions
Formalize large-small model collaboration as an economic decision problem, and optimize collaboration using concepts from information economics.

### Practical Value
- Edge computing: On-device small models procure information from cloud-based large models on demand
- Cost-sensitive applications: Reduce API call costs while ensuring quality
- Progressive capability improvement: Small models expand their capability boundaries by learning to seek help

### Educational and Research Tools
Provide complete training processes and evaluation tools to support exploration of reward design, strategy variants, and domain-specific applications.

## Technical Challenges and Future Directions

## Technical Challenges and Future Directions
### Challenges
- Trust calibration: Small models need to balance credulity and skepticism towards teacher outputs
- Dynamic procurement costs: Need to adapt to changes in teacher model latency and costs
- Multi-round procurement optimization: Optimal procurement sequence planning for complex problems

### Future Directions
- Introduce conditional/batch procurement strategies
- Explore multi-teacher information source selection
- Extend to multi-modal tasks

## Summary: Value and Outlook of the InfoBuy Framework

## Summary
InfoBuy provides a structured framework for large-small model collaborative reasoning, transforming the intuition of information procurement into a trainable strategy problem. Through two-stage training, small models achieve a balance between autonomy and external help, opening up new ideas for efficient and cost-effective AI systems. It is suitable for developers and researchers focusing on model efficiency, edge deployment, or large-small model collaboration to conduct in-depth research.
