The inference of Large Language Models (LLMs) usually relies on centralized cloud service platforms, such as OpenAI, Anthropic, or APIs provided by cloud vendors. While convenient, this model brings several issues:
- Privacy Risk: User data needs to be sent to third-party servers
- Cost Issue: API call fees increase with usage
- Availability Dependency: Service outages or restrictions affect applications
- Centralized Control: A few companies control key AI infrastructure
At the same time, Small Language Models (SLMs) like Phi-3, Gemma 2B, Llama 3 8B have made significant progress in performance and can run on consumer-grade hardware. This provides a technical foundation for decentralized inference.
The RNet Inference project was born in this context. It aims to build a decentralized P2P network, allowing users to run SLM inference locally or on nearby nodes, enabling distributed and privacy-preserving AI services.