Reading

InfoBuy: A Strategy Learning Framework for Information Procurement in Large-Small Model Collaborative Reasoning

Modeling large-small model collaborative reasoning as an information procurement problem, where small models learn when to purchase prompts, when to purchase verification, how many teacher tokens to buy, and whether to trust the purchased information. Implemented based on the HSP protocol, it includes a two-stage training process of SFT and RL.

大小模型协同信息采购HSP协议强化学习GRPO模型蒸馏推理优化开源项目

Published 2026-06-06 10:55Recent activity 2026-06-06 11:22Estimated read 8 min

InfoBuy: A Strategy Learning Framework for Information Procurement in Large-Small Model Collaborative Reasoning

Section 01

InfoBuy Framework Guide: Information Procurement Strategy for Large-Small Model Collaborative Reasoning

InfoBuy Framework Guide

InfoBuy is an open-source large-small model collaborative reasoning framework developed by nicebro123. Its core is modeling large-small model collaboration as an information procurement problem: small models learn when to purchase prompts, when to purchase verification, how many teacher tokens to buy, and whether to trust the purchased information. Implemented based on the HSP protocol, it adopts a two-stage training process of Supervised Fine-Tuning (SFT) + Reinforcement Learning (RL), providing new ideas for building efficient and cost-effective AI systems.

Section 02

Research Background and Core Motivation

Large models (e.g., GPT-4) have strong reasoning capabilities but high deployment costs and large latency; small models are lightweight and efficient but have limited complex reasoning capabilities. How to enable small models to efficiently "borrow" the capabilities of large models while maintaining independence has become a key issue. InfoBuy proposes transforming collaboration into information procurement decisions, allowing small models to dynamically adjust their help-seeking strategies.

Section 03

Core Concept: HSP Information Procurement Protocol

InfoBuy defines a structured information exchange mechanism based on the HSP protocol. Small models procure information from large models through specific tags:

<ASK>N</ASK>: Request up to N tokens of reasoning prompts
<VERIFY>N</VERIFY>: Request up to N tokens of verification services
<ACCEPT>: Adopt and trust the feedback from the teacher model

Section 04

Technical Architecture: Two-Stage Training Process (SFT+RL)

Technical Architecture: Two-Stage Training Process

SFT Supervised Fine-Tuning Stage

Construct reasoning trajectory data containing HSP tags, complete fine-tuning through data organizers and trainers, and pre-trained checkpoints ensure the model masters the basics of the protocol.

RL Reinforcement Learning Stage

Adopt the GRPO algorithm to optimize strategies, use the HSP Rollout state machine to manage procurement decisions, and the reward function evaluates:

Procurement efficiency (solving problems with the fewest steps)
Answer correctness
Autonomy balance
Trust calibration

Section 05

Project Structure and Engineering Practice

The code organization is clear:

SFT_stage/: Protocol SFT data construction, training scripts
RL_stage/: GRPO configuration, state machine, reward function
eval/: Collaborative generation and evaluation tools
setup/: Environment configuration scripts
docs/hsp/: Method documentation and training instructions
utils/: Teacher service tools

Large files (weights, datasets) are managed via the INFOBUY_STORE environment variable to avoid committing large files to Git.

Section 06

Research Significance and Application Prospects

Theoretical Contributions

Formalize large-small model collaboration as an economic decision problem, and optimize collaboration using concepts from information economics.

Practical Value

Edge computing: On-device small models procure information from cloud-based large models on demand
Cost-sensitive applications: Reduce API call costs while ensuring quality
Progressive capability improvement: Small models expand their capability boundaries by learning to seek help

Educational and Research Tools

Provide complete training processes and evaluation tools to support exploration of reward design, strategy variants, and domain-specific applications.

Section 07

Technical Challenges and Future Directions

Challenges

Trust calibration: Small models need to balance credulity and skepticism towards teacher outputs
Dynamic procurement costs: Need to adapt to changes in teacher model latency and costs
Multi-round procurement optimization: Optimal procurement sequence planning for complex problems

Future Directions

Introduce conditional/batch procurement strategies
Explore multi-teacher information source selection
Extend to multi-modal tasks

Section 08

Summary: Value and Outlook of the InfoBuy Framework

Summary

InfoBuy provides a structured framework for large-small model collaborative reasoning, transforming the intuition of information procurement into a trainable strategy problem. Through two-stage training, small models achieve a balance between autonomy and external help, opening up new ideas for efficient and cost-effective AI systems. It is suitable for developers and researchers focusing on model efficiency, edge deployment, or large-small model collaboration to conduct in-depth research.

Continue Reading

Keep going with more reads from the same topic.

Nornir MCP Server: An Enterprise-Grade Bridge for Integrating Large Language Models into Network Automation

Nornir MCP Server is an enterprise-level server based on the Model Context Protocol (MCP). It seamlessly integrates large language models (such as Claude) with the Nornir network automation framework, supporting natural language orchestration for multi-vendor network devices (Cisco, Arista, Juniper, etc.), and providing production-grade features like a dual-engine architecture (NAPALM + Netmiko), intelligent filtering, and a secure sandbox.

Recent activity 2026-05-06 20:51

Bibliothèque Française LLM: A French Public Domain Literature Index System Optimized for Large Language Models

Bibliothèque Française LLM is a structured indexing and annotation project for French public domain literature designed specifically for large language models (LLMs). It integrates multiple authoritative sources such as DraCor, Common Corpus, and Wikisource, providing metadata indexing categorized by genre, author, and era, as well as in-depth annotations for dramatic texts (including characters, lines, stage directions, etc.). Its aim is to enable LLMs to efficiently read and understand classic French literary works.

Recent activity 2026-05-06 20:50

Splinter: A Lock-Free Zero-Copy Shared Memory KV and Vector Storage Library That Eliminates Socket and Memcpy Overhead for LLM Inference

Splinter is a minimalist, high-performance key-value (KV) and vector storage system enabling zero-latency inter-process communication via shared memory and atomic operations. With only 766 lines of core code, it supports millions of operations per second and 768-dimensional vector storage, offering a new architectural approach for local LLM inference and data-intensive applications.

Recent activity 2026-04-03 08:49

Building an AWS Generative AI Application from Scratch: EC2 + Bedrock Hands-On Tutorial

A complete cloud-native AI application development guide for beginners, building a simple generative AI chatbot using Amazon EC2, Apache, Python CGI, and Amazon Bedrock, covering architecture design, IAM permission configuration, security best practices, and cost optimization suggestions.

Recent activity 2026-06-02 19:49