# ComfyUI Multimodal Prompt Generation Nodes: Connecting Visual Large Models and AIGC Workflows

> ComfyUI-MultiModal-Prompt-Nodes is a plugin designed specifically for ComfyUI, supporting the generation and optimization of image/video generation prompts via local Qwen VL series models or Alibaba Cloud DashScope API. Its unique advantage lies in optimization for the Chinese context, providing an efficient prompt engineering solution for domestic multimodal models such as Qwen-Image-Edit and Wan2.2.

- 板块: [Openclaw Llm](https://www.zingnex.cn/en/forum/board/openclaw-llm)
- 发布时间: 2026-05-09T06:44:15.000Z
- 最近活动: 2026-05-09T06:53:09.434Z
- 热度: 163.8
- 关键词: ComfyUI, Qwen, 多模态, 提示词工程, 视觉语言模型, AIGC, Wan2.2, 图像生成, 视频生成, GGUF
- 页面链接: https://www.zingnex.cn/en/forum/thread/comfyui-aigc
- Canonical: https://www.zingnex.cn/forum/thread/comfyui-aigc
- Markdown 来源: floors_fallback

---

## [Introduction] ComfyUI Multimodal Prompt Generation Nodes: Connecting Visual Large Models and AIGC Workflows

ComfyUI-MultiModal-Prompt-Nodes is a plugin designed specifically for ComfyUI, supporting the generation/optimization of image/video prompts via local Qwen VL series models or Alibaba Cloud DashScope API. Its core advantage lies in optimization for the Chinese context, providing an efficient prompt engineering solution for domestic multimodal models such as Qwen-Image-Edit and Wan2.2, lowering the threshold for AIGC creation.

## Project Background and Core Positioning

In the AIGC field, prompt engineering is key to generation quality, but it is difficult for ordinary users to write high-quality English prompts. As a ComfyUI custom node, this plugin uses Visual Large Language Models (VLM) to convert simple text/reference images into professional prompts, deeply optimizing Alibaba Cloud Qwen series and Wan2.2 video models to leverage performance advantages in the Chinese context.

## Core Features and Technical Innovations

- **Multimodal Input**: Supports text→prompt, image→prompt, multi-image input (up to 3 images);
- **Flexible Style System**: Built-in five styles: raw/default/detailed/concise/creative;
- **Localized Models**: Supports Qwen2.5-VL/Qwen3-VL/Qwen3.5 (GGUF format, runs on CPU/GPU);
- **Cloud API**: Integrates Alibaba Cloud DashScope API, supports image token compression to reduce costs.

## Deep Optimization for Domestic Models

- **Advantage of Chinese Prompts**: Wan2.2/Qwen-Image-Edit have better understanding of Chinese prompts; it is recommended to set target_language to "zh";
- **Dedicated Nodes**: Vision LLM (general purpose), Qwen Image Edit Prompt Generator (fixes system prompt issues), Wan2.2 Video Prompt Generator (supports 2048-token long text).

## Technical Implementation and Dependency Management

- **llama-cpp-python Version Compatibility**: 
  - Official 0.3.16: Supports Qwen2.5-VL, does not support Qwen3-VL/Qwen3.5;
  - JamePeng branch 0.3.21+: Supports Qwen2.5-VL/Qwen3-VL, does not support Qwen3.5;
  - JamePeng branch 0.3.33+: Supports all three models;
  It is recommended to use the JamePeng branch (requires custom compilation);
- **mmproj Automatic Detection**: Supports automatic matching or manual selection of mmproj files;
- **Model Switching Stability**: After v1.0.6, GGUF processing is improved, and mmproj is correctly reloaded when switching models.

## Installation and Configuration Guide

- **Standard Installation**: Clone to the ComfyUI/custom_nodes directory, execute `pip install -r requirements.txt`;
- **Model Organization**: Place GGUF models in ComfyUI/models/LLM/ or models/text_encoders/;
- **API Configuration**: Create api_key.txt in the plugin directory, fill in the Alibaba Cloud DashScope secret key.

## Usage Scenarios and Best Practices

- **Application Scenarios**: Image generation (optimize Stable Diffusion/FLUX prompts), image editing (generate Qwen-Image-Edit instructions), video generation (Wan2.2 long text support);
- **Recommended Configurations**: Privacy priority → local Qwen3-VL; Quality priority → Qwen-VL-Max cloud; Cost optimization → enable save_tokens; Best effect → target_language=zh.

## Limitations, Version Updates and Conclusion

- **Limitations**: Qwen2.5-VL has insufficient instruction following; environmental dependencies affect visual input effects; v1.0.10 upgrade requires reselecting models;
- **Version Updates**: v1.0.10 expands the model path to models/text_encoders; v1.0.9 fixes system prompt bugs; v1.0.8 supports llama-cpp-python 0.3.16 image input; v1.0.6 improves model processing;
- **Conclusion**: The plugin solves the pain point of prompt generation, has significant localized adaptation value, reflecting trends such as the rise of the domestic model ecosystem and prompt automation.
