正文

YummyCLI：为AI Agent设计的多模态图像生成命令行工具

YummyCLI是一款专为AI Agent和人类用户设计的多模态CLI工具，支持通过Gemini等模型生成和编辑图像，具备结构化JSON输出、安全凭证存储和Agent原生设计等特性。

CLIAI Agent图像生成Gemini多模态自动化JSON输出Skill系统

发布时间 2026/04/12 11:39最近活动 2026/04/12 11:50预计阅读 5 分钟

章节 01

YummyCLI: A Multimodal Image Generation CLI Tool for AI Agents

YummyCLI is an open-source, Agent-native CLI tool designed for both AI Agents and human users. It supports image generation and editing via models like Google Gemini (with plans to expand to Claude, OpenAI, 通义千问, etc.). Key features include structured JSON output, OS-native secure credential storage, and a Skill system that enables AI Agents to call it without extra prompt engineering. It acts as a bridge between AI Agents and multi-modal model services.

章节 02

Background: Challenges for AI Agents in Image Generation

As AI Agents evolve to handle complex tasks, developers face challenges in providing stable, safe, and easy-to-integrate image generation interfaces. Traditional CLIs are optimized for humans, with variable outputs and complex parsing—making them unsuitable for Agents. YummyCLI addresses this gap by offering an Agent-native design.

章节 03

Core Design Principles: Agent-Native & Secure

YummyCLI's core design focuses on Agent compatibility:

Structured Skill System: Built-in Skill files (e.g., yummy-gen-image in ./skills/) guide Agents on image generation/editing without extra prompts.
Standardized JSON Output: All commands output JSON to stdout, enabling Agents to parse results easily.
OS-Native Credential Storage: API keys are stored in OS keychains (macOS Keychain, Linux Secret Service) to avoid plaintext exposure.

章节 04

Installation & Key Functionalities

Installation:

Via npm: npm install -g @yummysource/yummycli
Agent Skills: npx skills add yummysource/yummycli -y -g (required for Agents). Features:
Two image generation entry points: Human-friendly (gemini nanobanana) and Agent-friendly (image generate --provider gemini).
Supports text-to-image, single/multi-image editing, custom aspect ratios (1:1 to 21:9, including 9:16), resolutions (512 to 4K), and model options (Flash for speed, Pro for quality).
Provider-agnostic: Switch models via --provider flag without code changes.

章节 05

Technical Highlights & Innovation

Security: Credentials are stored in OS-native keychains, ensuring encryption and access control. Structured Output: JSON output for all commands (success/failure) allows Agents to reliably parse results and chain tools. Skill System: Skill files define usage rules, security policies, and output contracts—enabling Agents to call YummyCLI like a function, reducing integration complexity.

章节 06

Application Scenarios

YummyCLI is useful in:

AI Agent Workflows: Content creation Agents can generate images, parse JSON outputs, and integrate them into products.
Batch Processing: Ideal for e-commerce product images or social media posts via shell scripts/CI/CD.
Multi-Modal App Development: Developers can integrate it via subprocess calls, avoiding direct API handling.

章节 07

Future Outlook & Conclusion

Current Status: Active development with stable core features. Future Plans: Expand to more providers (Claude, OpenAI, 通义千问), add video generation, and enhance the Skill ecosystem. Conclusion: YummyCLI represents a new CLI paradigm—optimized for both humans and AI Agents. It simplifies image generation for developers and provides a reliable interface for Agents, bridging AI capabilities to real-world applications.