Zing Forum

Reading

Guidee: Cross-Platform Desktop AI Assistant, Turning Your Screen into an Intelligent Collaboration Partner

Guidee is a cross-platform desktop AI assistant built on Tauri, LangGraph, and Claude Sonnet 4. It provides instant help or executes complex tasks during users' work through screen perception, voice interaction, and intelligent routing.

AI助手桌面应用TauriLangGraphClaude屏幕感知语音交互浏览器自动化开源项目智能体
Published 2026-05-19 04:15Recent activity 2026-05-19 04:17Estimated read 5 min
Guidee: Cross-Platform Desktop AI Assistant, Turning Your Screen into an Intelligent Collaboration Partner
1

Section 01

Guidee: Cross-Platform Desktop AI Assistant, Turning Your Screen into an Intelligent Collaboration Partner (Introduction)

Guidee is an open-source cross-platform desktop AI assistant built on Tauri, LangGraph, and Claude Sonnet 4. It deeply integrates into users' workflows through screen perception, voice interaction, and intelligent routing. Existing as a floating overlay, it neither steals focus nor fails to respond instantly, distinguishing itself from traditional chat windows or plugin forms, and adheres to a privacy-first design.

2

Section 02

Background: Limitations of AI Assistants and Guidee's Design Philosophy

Most current AI assistants are limited to chat windows or browser plugin forms, while Guidee marks a step forward for AI assistants toward system integration. Its design philosophy is to let AI truly integrate into workflows instead of forcing users to leave their current applications. It exists as a floating overlay near the cursor, balancing responsiveness and non-intrusiveness.

3

Section 03

Core Architecture: Intelligent Routing and Four-Layer Perception Stack

Supervisor-First Intelligent Routing Process

  1. Voice wake-up (local Picovoice Porcupine) → 2. Local Whisper.cpp transcribes voice → 3. Supervisor Agent analyzes screen screenshots and text intent → 4. Route to instant answer or background agent →5. Floating layer streams output

Four-Layer Perception Stack

  1. Vision Agent: Claude's visual capabilities analyze interface elements
  2. DOM Agent: Parses HTML to generate precise CSS selectors
  3. Instruction Agent: Converts natural language into operation plans
  4. Action Agent: Executes operations via Playwright, supporting self-correction
4

Section 04

Multi-Scenario Applications: From Instant Q&A to Complex Tasks

  • Instant Q&A: Explain buttons/errors, respond within 1.5 seconds
  • Browser Automation: Export CSV, fill forms, etc., completed in 2-8 seconds
  • Research Tasks: Search and integrate information, return summaries in5-15 seconds
  • File Processing: Summarize PDFs, find notes/todos, completed in3-10 seconds
  • Email Processing: Compose and send emails, completed in3-6 seconds Users can view real-time progress via the overlay
5

Section 05

Tech Stack and Privacy-First Design

Tech Stack

  • Desktop: Tauri2 (Rust+React) lightweight cross-platform
  • AI Model: Claude Sonnet4 (multimodal capabilities)
  • Agents: LangGraph (Python orchestration)
  • Voice: Local Picovoice/Whisper.cpp
  • Browser Automation: Playwright
  • Backend: FastAPI+Redis+BullMQ+Supabase+LangSmith

Privacy Design

Screen screenshots are not stored; voice processing is done locally to protect sensitive data

6

Section 06

Open Source Significance and Future Outlook

As an open-source project, Guidee demonstrates a new paradigm of AI as an intelligent layer that understands existing applications, with potential value in enterprise scenarios (new employee guidance, data entry). It represents the evolution of AI from "conversation" to "collaboration", a key step for AI assistants from "toys" to "tools", and may become an important form of next-generation human-computer interaction in the future