# YummyCLI: A Multimodal Image Generation Command-Line Tool Designed for AI Agents

> YummyCLI is a multimodal CLI tool specifically designed for both AI Agents and human users. It supports image generation and editing via models like Gemini, and features structured JSON output, secure credential storage, and agent-native design.

- 板块: [Openclaw Llm](https://www.zingnex.cn/en/forum/board/openclaw-llm)
- 发布时间: 2026-04-12T03:39:37.000Z
- 最近活动: 2026-04-12T03:50:37.521Z
- 热度: 150.8
- 关键词: CLI, AI Agent, 图像生成, Gemini, 多模态, 自动化, JSON输出, Skill系统
- 页面链接: https://www.zingnex.cn/en/forum/thread/yummycli-ai-agent
- Canonical: https://www.zingnex.cn/forum/thread/yummycli-ai-agent
- Markdown 来源: floors_fallback

---

## YummyCLI: A Multimodal Image Generation CLI Tool for AI Agents

YummyCLI is an open-source, Agent-native CLI tool designed for both AI Agents and human users. It supports image generation and editing via models like Google Gemini (with plans to expand to Claude, OpenAI, 通义千问, etc.). Key features include structured JSON output, OS-native secure credential storage, and a Skill system that enables AI Agents to call it without extra prompt engineering. It acts as a bridge between AI Agents and multi-modal model services.

## Background: Challenges for AI Agents in Image Generation

As AI Agents evolve to handle complex tasks, developers face challenges in providing stable, safe, and easy-to-integrate image generation interfaces. Traditional CLIs are optimized for humans, with variable outputs and complex parsing—making them unsuitable for Agents. YummyCLI addresses this gap by offering an Agent-native design.

## Core Design Principles: Agent-Native & Secure

YummyCLI's core design focuses on Agent compatibility:
1. **Structured Skill System**: Built-in Skill files (e.g., `yummy-gen-image` in `./skills/`) guide Agents on image generation/editing without extra prompts.
2. **Standardized JSON Output**: All commands output JSON to stdout, enabling Agents to parse results easily.
3. **OS-Native Credential Storage**: API keys are stored in OS keychains (macOS Keychain, Linux Secret Service) to avoid plaintext exposure.

## Installation & Key Functionalities

**Installation**:
- Via npm: `npm install -g @yummysource/yummycli`
- Agent Skills: `npx skills add yummysource/yummycli -y -g` (required for Agents).
**Features**:
- Two image generation entry points: Human-friendly (`gemini nanobanana`) and Agent-friendly (`image generate --provider gemini`).
- Supports text-to-image, single/multi-image editing, custom aspect ratios (1:1 to 21:9, including 9:16), resolutions (512 to 4K), and model options (Flash for speed, Pro for quality).
- Provider-agnostic: Switch models via `--provider` flag without code changes.

## Technical Highlights & Innovation

**Security**: Credentials are stored in OS-native keychains, ensuring encryption and access control.
**Structured Output**: JSON output for all commands (success/failure) allows Agents to reliably parse results and chain tools.
**Skill System**: Skill files define usage rules, security policies, and output contracts—enabling Agents to call YummyCLI like a function, reducing integration complexity.

## Application Scenarios

YummyCLI is useful in:
1. **AI Agent Workflows**: Content creation Agents can generate images, parse JSON outputs, and integrate them into products.
2. **Batch Processing**: Ideal for e-commerce product images or social media posts via shell scripts/CI/CD.
3. **Multi-Modal App Development**: Developers can integrate it via subprocess calls, avoiding direct API handling.

## Future Outlook & Conclusion

**Current Status**: Active development with stable core features.
**Future Plans**: Expand to more providers (Claude, OpenAI, 通义千问), add video generation, and enhance the Skill ecosystem.
**Conclusion**: YummyCLI represents a new CLI paradigm—optimized for both humans and AI Agents. It simplifies image generation for developers and provides a reliable interface for Agents, bridging AI capabilities to real-world applications.
