Reading

Social Draft: On-Device AI Social Reply Assistant, Making Every Response Just Right

An iOS on-device-first social reply assistant app that helps users find natural, thoughtful responses in awkward moments through local large model inference, distilled SFT training, and user-controllable LoRA personalization.

端侧AI社交助手LoRA本地LLMiOS应用隐私保护SwiftUIllama.cppAI回复建议个性化模型

Published 2026-04-26 15:15Recent activity 2026-04-26 15:21Estimated read 6 min

Section 01

Introduction: Social Draft—On-Device AI Social Reply Assistant, Making Every Response Just Right

Social Draft is an iOS on-device-first social reply assistant app, core to solving the awkward anxiety users feel when not knowing how to respond to messages. It positions itself as a "reply advisor" rather than a chatbot, with the final reply decision always in the user's hands. Through local large model inference, distilled SFT training, and user-controllable LoRA personalization, it provides natural and thoughtful reply suggestions. Meanwhile, its on-device-first design ensures privacy—data never leaves the device in local mode.

Section 02

Background: Pain Points of Social Replies and Product Positioning

The product targets common awkward scenarios in real conversations: declining invitations when tired, politely refusing unwanted events, appropriate responses in emotional dialogues, concise professional replies for work/school, difficulty expressing oneself, etc. Unlike AI products on the market that "speak for users", Social Draft chooses to "help users speak", ensuring users retain the final decision on replies.

Section 03

Methodology: On-Device-First Technical Architecture and Personalization Support

It uses a three-layer backend architecture: Mock (default safe mode, no model/API required), Cloud (calls APIs like OpenAI/Anthropic/Gemini), Local (on-device GGUF model, privacy-first). The Local mode is core—no network needed, data runs locally. It supports users training LoRA adapters locally to achieve style personalization, privacy protection, and lightweight adaptation. Currently supports Llama-3.2-1B/3B-Instruct-Q4_K_M and the reply_sft_lora_v1 adapter.

Section 04

Evidence: Core Features Enable Natural and Appropriate Responses

Provides multiple styles of smart suggestion cards (natural, direct, friendly, thoughtful, decision-oriented); Ghost Text inline completion feature (runs locally to speed up replies); context awareness (reads recent conversations to understand tone and topic); reply target selection (generates suggestions for specific messages).

Section 05

Research Support: Complete Workflow from Data to Model

Includes a two-stage Claude distillation process to generate synthetic social reply datasets; provides complete LoRA/SFT training notebooks to support users training personalized models with their own dialogue data; tools in the Experiments_Benchmarks directory can compare reply quality between local and cloud models.

Section 06

Technical Implementation: iOS Architecture and Multi-Backend Support

The app uses SwiftUI to build a layered architecture (AppRootView, Features/Chat/Settings, etc.); local inference is implemented via llama.xcframework (loading GGUF models, mounting LoRA, sampling chain optimization); cloud supports OpenAI/Anthropic/Groq, etc.; Supabase integration is used for data persistence of demo chat services (thread management, real-time events, etc.).

Section 07

Privacy and Ethics: On-Device-First Design Protects User Rights

Core functions run locally; users control reply rights; three modes are transparently distinguished; only necessary context is sent to the cloud. It avoids the ethical risk of AI replacing social interaction, positioning itself as an auxiliary tool to help users express themselves better.

Section 08

Conclusion and Recommendations: Responsible Practice of AI-Assisted Social Interaction

Social Draft demonstrates the transformation of large model technology into a product that solves user pain points. Its advantages include on-device priority, LoRA personalization, and emphasis on privacy. For users, it alleviates social anxiety; for developers, it is a worthwhile on-device AI project to study, including complete application implementation and research workflow.

Continue Reading

Keep going with more reads from the same topic.

Nornir MCP Server: An Enterprise-Grade Bridge for Integrating Large Language Models into Network Automation

Nornir MCP Server is an enterprise-level server based on the Model Context Protocol (MCP). It seamlessly integrates large language models (such as Claude) with the Nornir network automation framework, supporting natural language orchestration for multi-vendor network devices (Cisco, Arista, Juniper, etc.), and providing production-grade features like a dual-engine architecture (NAPALM + Netmiko), intelligent filtering, and a secure sandbox.

Recent activity 2026-05-06 20:51

Bibliothèque Française LLM: A French Public Domain Literature Index System Optimized for Large Language Models

Bibliothèque Française LLM is a structured indexing and annotation project for French public domain literature designed specifically for large language models (LLMs). It integrates multiple authoritative sources such as DraCor, Common Corpus, and Wikisource, providing metadata indexing categorized by genre, author, and era, as well as in-depth annotations for dramatic texts (including characters, lines, stage directions, etc.). Its aim is to enable LLMs to efficiently read and understand classic French literary works.

Recent activity 2026-05-06 20:50

Splinter: A Lock-Free Zero-Copy Shared Memory KV and Vector Storage Library That Eliminates Socket and Memcpy Overhead for LLM Inference

Splinter is a minimalist, high-performance key-value (KV) and vector storage system enabling zero-latency inter-process communication via shared memory and atomic operations. With only 766 lines of core code, it supports millions of operations per second and 768-dimensional vector storage, offering a new architectural approach for local LLM inference and data-intensive applications.

Recent activity 2026-04-03 08:49

libmlxforge: An Embedded MLX LLM Inference Engine for Apple Silicon

libmlxforge is an embeddable MLX large language model (LLM) inference engine designed specifically for Apple Silicon. It provides a unified C ABI interface, supports calls from Node.js, Swift, and Rust, and features continuous batching, streaming output, JSON-constrained structured output, and embedding vector generation.

Recent activity 2026-06-09 17:23