Reading

FuseFSS: Efficient and Secure Large Language Model Inference Based on Function Secret Sharing

FuseFSS replaces operator-by-operator protocol design with a unified compilation pipeline, achieving 1.24-1.50x end-to-end acceleration while maintaining accuracy, and significantly reducing communication overhead and preprocessing costs.

大语言模型安全推理函数秘密共享隐私计算多方安全计算FSSGPU加速定点数运算

Published 2026-06-08 22:30Recent activity 2026-06-09 10:51Estimated read 6 min

FuseFSS: Efficient and Secure Large Language Model Inference Based on Function Secret Sharing

Section 01

[Introduction] FuseFSS: Core Innovations in Efficient and Secure LLM Inference Based on Function Secret Sharing

FuseFSS replaces operator-by-operator protocol design with a unified compilation pipeline, solving the fragmentation problem of non-linear operations in function secret sharing (FSS)-based secure inference systems. It achieves 1.24-1.50x end-to-end acceleration while maintaining accuracy, and significantly reduces communication overhead and preprocessing costs. This article will discuss aspects including background, methods, performance, and implementation.

Section 02

Background: Challenges in Secure Inference and Current State of FSS Technology

Background of Privacy Computing

As LLM capabilities improve, the conflict between protecting user sensitive data and model weight privacy has become prominent. The two-server secure inference architecture emerged, allowing multi-party collaboration while keeping data private.

Current State of FSS Technology

As a cryptographic primitive, FSS can efficiently handle linear layer operations, but fixed-point non-linear operations (such as ReLU, GELU) face performance bottlenecks due to fragmented design (each operator has a dedicated protocol), leading to issues like code duplication and optimization difficulties.

Section 03

Methodology: Innovation of FuseFSS's Unified Compilation Pipeline

FuseFSS replaces operator-by-operator protocols with a unified compilation pipeline:

Core Design: Define a general operator description format (interval partitioning, low-degree arithmetic fragments, predicate bits);
Compiler Output:
- Packed Comparison: Merge multiple interval boundary comparisons to reduce communication rounds;
- Vector Interval Lookup: FSS-based secure table lookup to optimize arithmetic operations.

Section 04

Evidence: Quantitative Analysis of FuseFSS's Performance Improvement

Experimental results show:

End-to-end Acceleration: 1.24-1.50x (accuracy maintained);
Communication Overhead: Online communication volume reduced by 9%-16%;
Preprocessing Optimization: Key generation time reduced by 14%-23%, key size shrunk by 20%-24%.

Section 05

Technical Implementation Details: Fixed-Point and Batch Processing Optimization

Fixed-Point Operation Handling

For fixed-point optimization, map to integer operations, balance accuracy and overhead through intelligent interval partitioning and coefficient selection;

Batch Processing Strategy

Automatically pack multi-element operations to amortize the fixed cost of FSS evaluation;

Compatibility

The generated FSS evaluation can be integrated into existing FSS libraries without rewriting the underlying cryptographic implementation.

Section 06

Application Scenarios: Privacy Protection and Cross-Organization Collaboration

Privacy-Preserving Inference Services: Suitable for sensitive fields such as healthcare and finance;
Model-as-a-Service (MaaS) Enhancement: Protect intellectual property rights of model weights;
Cross-Organization Collaboration: Support scenarios like joint risk control and cross-institutional medical research.

Section 07

Limitations and Future Work Directions

Current Limitations

Limited operator coverage (mainly for common activation functions);
Experiments focused on BERT/GPT-style models; ultra-large-scale models need exploration;
GPU optimization is not directly applicable to other accelerators;

Future Directions

Expand operators and model architectures, hybrid TEE solutions, accuracy-performance trade-off tools, support for dynamic model updates.

Section 08

Conclusion: Significance and Prospects of FuseFSS

FuseFSS solves the fragmentation problem of FSS secure inference through a unified compilation pipeline, bringing significant performance improvements and providing a scalable and maintainable architectural paradigm. As privacy computing becomes increasingly important today, it provides key infrastructure for building trusted AI systems and is expected to promote the implementation of more privacy-preserving LLM applications in the future.

Continue Reading

Keep going with more reads from the same topic.

Nornir MCP Server: An Enterprise-Grade Bridge for Integrating Large Language Models into Network Automation

Nornir MCP Server is an enterprise-level server based on the Model Context Protocol (MCP). It seamlessly integrates large language models (such as Claude) with the Nornir network automation framework, supporting natural language orchestration for multi-vendor network devices (Cisco, Arista, Juniper, etc.), and providing production-grade features like a dual-engine architecture (NAPALM + Netmiko), intelligent filtering, and a secure sandbox.

Recent activity 2026-05-06 20:51

Bibliothèque Française LLM: A French Public Domain Literature Index System Optimized for Large Language Models

Bibliothèque Française LLM is a structured indexing and annotation project for French public domain literature designed specifically for large language models (LLMs). It integrates multiple authoritative sources such as DraCor, Common Corpus, and Wikisource, providing metadata indexing categorized by genre, author, and era, as well as in-depth annotations for dramatic texts (including characters, lines, stage directions, etc.). Its aim is to enable LLMs to efficiently read and understand classic French literary works.

Recent activity 2026-05-06 20:50

Splinter: A Lock-Free Zero-Copy Shared Memory KV and Vector Storage Library That Eliminates Socket and Memcpy Overhead for LLM Inference

Splinter is a minimalist, high-performance key-value (KV) and vector storage system enabling zero-latency inter-process communication via shared memory and atomic operations. With only 766 lines of core code, it supports millions of operations per second and 768-dimensional vector storage, offering a new architectural approach for local LLM inference and data-intensive applications.

Recent activity 2026-04-03 08:49

Building an AWS Generative AI Application from Scratch: EC2 + Bedrock Hands-On Tutorial

A complete cloud-native AI application development guide for beginners, building a simple generative AI chatbot using Amazon EC2, Apache, Python CGI, and Amazon Bedrock, covering architecture design, IAM permission configuration, security best practices, and cost optimization suggestions.

Recent activity 2026-06-02 19:49