Zing Forum

Reading

InfoBuy: A Strategy Learning Framework for Information Procurement in Large-Small Model Collaborative Reasoning

Modeling large-small model collaborative reasoning as an information procurement problem, where small models learn when to purchase prompts, when to purchase verification, how many teacher tokens to buy, and whether to trust the purchased information. Implemented based on the HSP protocol, it includes a two-stage training process of SFT and RL.

大小模型协同信息采购HSP协议强化学习GRPO模型蒸馏推理优化开源项目
Published 2026-06-06 10:55Recent activity 2026-06-06 11:22Estimated read 8 min
InfoBuy: A Strategy Learning Framework for Information Procurement in Large-Small Model Collaborative Reasoning
1

Section 01

InfoBuy Framework Guide: Information Procurement Strategy for Large-Small Model Collaborative Reasoning

InfoBuy Framework Guide

InfoBuy is an open-source large-small model collaborative reasoning framework developed by nicebro123. Its core is modeling large-small model collaboration as an information procurement problem: small models learn when to purchase prompts, when to purchase verification, how many teacher tokens to buy, and whether to trust the purchased information. Implemented based on the HSP protocol, it adopts a two-stage training process of Supervised Fine-Tuning (SFT) + Reinforcement Learning (RL), providing new ideas for building efficient and cost-effective AI systems.

2

Section 02

Research Background and Core Motivation

Research Background and Core Motivation

Large models (e.g., GPT-4) have strong reasoning capabilities but high deployment costs and large latency; small models are lightweight and efficient but have limited complex reasoning capabilities. How to enable small models to efficiently "borrow" the capabilities of large models while maintaining independence has become a key issue. InfoBuy proposes transforming collaboration into information procurement decisions, allowing small models to dynamically adjust their help-seeking strategies.

3

Section 03

Core Concept: HSP Information Procurement Protocol

Core Concept: HSP Information Procurement Protocol

InfoBuy defines a structured information exchange mechanism based on the HSP protocol. Small models procure information from large models through specific tags:

  • <ASK>N</ASK>: Request up to N tokens of reasoning prompts
  • <VERIFY>N</VERIFY>: Request up to N tokens of verification services
  • <ACCEPT>: Adopt and trust the feedback from the teacher model
4

Section 04

Technical Architecture: Two-Stage Training Process (SFT+RL)

Technical Architecture: Two-Stage Training Process

SFT Supervised Fine-Tuning Stage

Construct reasoning trajectory data containing HSP tags, complete fine-tuning through data organizers and trainers, and pre-trained checkpoints ensure the model masters the basics of the protocol.

RL Reinforcement Learning Stage

Adopt the GRPO algorithm to optimize strategies, use the HSP Rollout state machine to manage procurement decisions, and the reward function evaluates:

  • Procurement efficiency (solving problems with the fewest steps)
  • Answer correctness
  • Autonomy balance
  • Trust calibration
5

Section 05

Project Structure and Engineering Practice

Project Structure and Engineering Practice

The code organization is clear:

  • SFT_stage/: Protocol SFT data construction, training scripts
  • RL_stage/: GRPO configuration, state machine, reward function
  • eval/: Collaborative generation and evaluation tools
  • setup/: Environment configuration scripts
  • docs/hsp/: Method documentation and training instructions
  • utils/: Teacher service tools

Large files (weights, datasets) are managed via the INFOBUY_STORE environment variable to avoid committing large files to Git.

6

Section 06

Research Significance and Application Prospects

Research Significance and Application Prospects

Theoretical Contributions

Formalize large-small model collaboration as an economic decision problem, and optimize collaboration using concepts from information economics.

Practical Value

  • Edge computing: On-device small models procure information from cloud-based large models on demand
  • Cost-sensitive applications: Reduce API call costs while ensuring quality
  • Progressive capability improvement: Small models expand their capability boundaries by learning to seek help

Educational and Research Tools

Provide complete training processes and evaluation tools to support exploration of reward design, strategy variants, and domain-specific applications.

7

Section 07

Technical Challenges and Future Directions

Technical Challenges and Future Directions

Challenges

  • Trust calibration: Small models need to balance credulity and skepticism towards teacher outputs
  • Dynamic procurement costs: Need to adapt to changes in teacher model latency and costs
  • Multi-round procurement optimization: Optimal procurement sequence planning for complex problems

Future Directions

  • Introduce conditional/batch procurement strategies
  • Explore multi-teacher information source selection
  • Extend to multi-modal tasks
8

Section 08

Summary: Value and Outlook of the InfoBuy Framework

Summary

InfoBuy provides a structured framework for large-small model collaborative reasoning, transforming the intuition of information procurement into a trainable strategy problem. Through two-stage training, small models achieve a balance between autonomy and external help, opening up new ideas for efficient and cost-effective AI systems. It is suitable for developers and researchers focusing on model efficiency, edge deployment, or large-small model collaboration to conduct in-depth research.