Reading

ComfyUI Multimodal Prompt Generation Nodes: Connecting Visual Large Models and AIGC Workflows

ComfyUI-MultiModal-Prompt-Nodes is a plugin designed specifically for ComfyUI, supporting the generation and optimization of image/video generation prompts via local Qwen VL series models or Alibaba Cloud DashScope API. Its unique advantage lies in optimization for the Chinese context, providing an efficient prompt engineering solution for domestic multimodal models such as Qwen-Image-Edit and Wan2.2.

ComfyUIQwen多模态提示词工程视觉语言模型AIGCWan2.2图像生成视频生成GGUF

Published 2026-05-09 14:44Recent activity 2026-05-09 14:53Estimated read 6 min

Section 01

[Introduction] ComfyUI Multimodal Prompt Generation Nodes: Connecting Visual Large Models and AIGC Workflows

ComfyUI-MultiModal-Prompt-Nodes is a plugin designed specifically for ComfyUI, supporting the generation/optimization of image/video prompts via local Qwen VL series models or Alibaba Cloud DashScope API. Its core advantage lies in optimization for the Chinese context, providing an efficient prompt engineering solution for domestic multimodal models such as Qwen-Image-Edit and Wan2.2, lowering the threshold for AIGC creation.

Section 02

Project Background and Core Positioning

In the AIGC field, prompt engineering is key to generation quality, but it is difficult for ordinary users to write high-quality English prompts. As a ComfyUI custom node, this plugin uses Visual Large Language Models (VLM) to convert simple text/reference images into professional prompts, deeply optimizing Alibaba Cloud Qwen series and Wan2.2 video models to leverage performance advantages in the Chinese context.

Section 03

Core Features and Technical Innovations

Multimodal Input: Supports text→prompt, image→prompt, multi-image input (up to 3 images);
Flexible Style System: Built-in five styles: raw/default/detailed/concise/creative;
Localized Models: Supports Qwen2.5-VL/Qwen3-VL/Qwen3.5 (GGUF format, runs on CPU/GPU);
Cloud API: Integrates Alibaba Cloud DashScope API, supports image token compression to reduce costs.

Section 04

Deep Optimization for Domestic Models

Advantage of Chinese Prompts: Wan2.2/Qwen-Image-Edit have better understanding of Chinese prompts; it is recommended to set target_language to "zh";
Dedicated Nodes: Vision LLM (general purpose), Qwen Image Edit Prompt Generator (fixes system prompt issues), Wan2.2 Video Prompt Generator (supports 2048-token long text).

Section 05

Technical Implementation and Dependency Management

llama-cpp-python Version Compatibility:
- Official 0.3.16: Supports Qwen2.5-VL, does not support Qwen3-VL/Qwen3.5;
- JamePeng branch 0.3.21+: Supports Qwen2.5-VL/Qwen3-VL, does not support Qwen3.5;
- JamePeng branch 0.3.33+: Supports all three models; It is recommended to use the JamePeng branch (requires custom compilation);
mmproj Automatic Detection: Supports automatic matching or manual selection of mmproj files;
Model Switching Stability: After v1.0.6, GGUF processing is improved, and mmproj is correctly reloaded when switching models.

Section 06

Installation and Configuration Guide

Standard Installation: Clone to the ComfyUI/custom_nodes directory, execute pip install -r requirements.txt;
Model Organization: Place GGUF models in ComfyUI/models/LLM/ or models/text_encoders/;
API Configuration: Create api_key.txt in the plugin directory, fill in the Alibaba Cloud DashScope secret key.

Section 07

Usage Scenarios and Best Practices

Application Scenarios: Image generation (optimize Stable Diffusion/FLUX prompts), image editing (generate Qwen-Image-Edit instructions), video generation (Wan2.2 long text support);
Recommended Configurations: Privacy priority → local Qwen3-VL; Quality priority → Qwen-VL-Max cloud; Cost optimization → enable save_tokens; Best effect → target_language=zh.

Section 08

Limitations, Version Updates and Conclusion

Limitations: Qwen2.5-VL has insufficient instruction following; environmental dependencies affect visual input effects; v1.0.10 upgrade requires reselecting models;
Version Updates: v1.0.10 expands the model path to models/text_encoders; v1.0.9 fixes system prompt bugs; v1.0.8 supports llama-cpp-python 0.3.16 image input; v1.0.6 improves model processing;
Conclusion: The plugin solves the pain point of prompt generation, has significant localized adaptation value, reflecting trends such as the rise of the domestic model ecosystem and prompt automation.

Continue Reading

Keep going with more reads from the same topic.

Nornir MCP Server: An Enterprise-Grade Bridge for Integrating Large Language Models into Network Automation

Nornir MCP Server is an enterprise-level server based on the Model Context Protocol (MCP). It seamlessly integrates large language models (such as Claude) with the Nornir network automation framework, supporting natural language orchestration for multi-vendor network devices (Cisco, Arista, Juniper, etc.), and providing production-grade features like a dual-engine architecture (NAPALM + Netmiko), intelligent filtering, and a secure sandbox.

Recent activity 2026-05-06 20:51

Bibliothèque Française LLM: A French Public Domain Literature Index System Optimized for Large Language Models

Bibliothèque Française LLM is a structured indexing and annotation project for French public domain literature designed specifically for large language models (LLMs). It integrates multiple authoritative sources such as DraCor, Common Corpus, and Wikisource, providing metadata indexing categorized by genre, author, and era, as well as in-depth annotations for dramatic texts (including characters, lines, stage directions, etc.). Its aim is to enable LLMs to efficiently read and understand classic French literary works.

Recent activity 2026-05-06 20:50

Splinter: A Lock-Free Zero-Copy Shared Memory KV and Vector Storage Library That Eliminates Socket and Memcpy Overhead for LLM Inference

Splinter is a minimalist, high-performance key-value (KV) and vector storage system enabling zero-latency inter-process communication via shared memory and atomic operations. With only 766 lines of core code, it supports millions of operations per second and 768-dimensional vector storage, offering a new architectural approach for local LLM inference and data-intensive applications.

Recent activity 2026-04-03 08:49

Folkering OS: When the Operating System Itself Is AI—A Self-Evolving Bare-Metal Rust System

Folkering OS is the world's first AI-native bare-metal operating system, entirely written in Rust no_std without relying on Linux, POSIX, or libc. It can generate commands from scratch, compile them into WASM, and run them in 10 seconds, achieving true self-evolution.

Recent activity 2026-04-09 16:15