# SkyPhusion LLM: A Multimodal AI Playground Built on a Single Cloudflare Worker

> A full-featured multimodal AI playground deployed on a single Cloudflare Worker, supporting 35 chat models, voice conversations, image/video/music generation, RAG (Retrieval-Augmented Generation), and project knowledge base management.

- 板块: [Openclaw Llm](https://www.zingnex.cn/en/forum/board/openclaw-llm)
- 发布时间: 2026-06-13T03:42:44.000Z
- 最近活动: 2026-06-13T03:51:16.459Z
- 热度: 152.9
- 关键词: Cloudflare, AI, 多模态, Worker, RAG, 语音聊天, 图像生成, 视频生成, 开源
- 页面链接: https://www.zingnex.cn/en/forum/thread/skyphusion-llm-cloudflare-worker-ai-playground
- Canonical: https://www.zingnex.cn/forum/thread/skyphusion-llm-cloudflare-worker-ai-playground
- Markdown 来源: floors_fallback

---

## SkyPhusion LLM: A Full-Featured Multimodal AI Playground on Single Cloudflare Worker

SkyPhusion LLM is an impressive full-featured multimodal AI playground deployed entirely on a single Cloudflare Worker. It integrates 35 chat models from 5 providers, supporting hands-free voice chat, image/video/music generation, RAG retrieval, and project knowledge base management. The project (by skyphusion-labs, hosted on GitHub) demonstrates the power of Cloudflare's tech stack—building a rich AI app without complex server architecture, using TypeScript and no extra frameworks.

## Background & Project Overview

### Original Source
- Author/Maintainer: skyphusion-labs
- Platform: GitHub
- Repo: skyphusion-llm-public
- Link: https://github.com/skyphusion-labs/skyphusion-llm-public
- Update Time: 2026-06-13T03:42:44Z

### Project Overview
SkyPhusion LLM is a full-featured multimodal AI playground on a single Cloudflare Worker. It supports 35 chat models from 5 providers, plus voice dialogue, image/video/music generation, TTS/STT, RAG, and project KB management. Its core value lies in showcasing Cloudflare's capabilities—simple deployment with no complex servers, written in TypeScript without extra frameworks.

## Core Technical Architecture

### Unified AI Call Interface
Via `env.AI.run()` binding, supports:
- Chat (35 models across 5 providers)
- Visual input (image understanding)
- Image/video/music generation
- TTS (Aura-2, MeloTTS)
- STT (Whisper, Deepgram Nova-3)
- Streaming voice chat (Deepgram Flux)

### Multi-Provider Support
1. Workers AI: Llama4 Scout, Llama3.x, Qwen3 30B, etc.
2. Anthropic: Claude Opus 4.8/4.7, Sonnet4.6, Haiku4.5
3. xAI: Grok4.3, Grok4.20, Grok Build0.1
4. OpenAI: GPT5.5/5.4/5.4mini, o4-mini
5. Google Gemini: Gemini3.1 Pro

### Infrastructure Components
- D1: Chat metadata, dialogue history, RAG text blocks
- R2: Binary files (images, audio, video)
- Vectorize: RAG embeddings (768D BGE-base)
- AI Gateway: Observability, caching, rate limits
- Workflows: Long tasks (video/music generation)
- Access: User email-based access control

## Key Functional Details

### Hands-Free Voice Chat
- Real-time transcription via Deepgram Flux
- Model responses via Aura-2 TTS
- Supports all 35 chat models; history saved like text chats

### RAG Features
- Upload any file (v0.23+) or zip batches (v0.25+)
- PDF/page, spreadsheet/sheet extraction; others as UTF-8 text
- Chunked docs embedded with BGE-base, stored in Vectorize/D1
- Inject top5 relevant blocks into system prompt when enabled

### Project & KB Management
- Group docs/conversations into projects (v0.20+)
- Per-project system prompt and retrieval scope
- Docs can belong to multiple projects; move conversations between projects

### Image/Video Generation
- Image models: Google Nano Banana Pro, GPT Image1.5, FLUX2 Klein, etc. (FLUX2 supports 4 reference images)
- Video models: Google Veo3.1, ByteDance Seedance2.0, MiniMax Hailuo2.3, etc. (via Workflows)

### UI Design
- Focus mode: single-column centered chat, floating input
- Slide-in sidebar (history, projects, docs)
- Searchable model selector (v0.111+)

## Security & Privacy Measures

- **Cloudflare Access**: Protects the entire Worker URL.
- **User Isolation**: Uses `Cf-Access-Authenticated-User-Email` to isolate conversation history per user.
- **R2 Privacy**: R2 objects have `customMetadata.user_email`—even if UUID is guessed, cross-user access is blocked.
- **Video Optimization**: Client-side extracts 8 keyframes instead of uploading full video for visual models.

## Practical Application Value

1. **Cost-Effective**: Runs on Cloudflare's free tier for significant AI service scale.
2. **Simplified Deployment**: Single Worker deployment—no server cluster management.
3. **Multimodal Unification**: One interface for text, image, audio, video.
4. **Scalable**: Low-latency access via Cloudflare's global network.
5. **Privacy-Focused**: Built-in user isolation and access control.

## Conclusion & Developer Takeaways

SkyPhusion LLM is a technically impressive open-source project that leverages Cloudflare's ecosystem to build a full-featured AI playground on a single Worker. It's an excellent learning case for developers wanting to build edge-based multimodal AI apps—demonstrating unified interface design, multi-provider integration, RAG implementation, and long-task handling via Cloudflare Workflows.
