# MiniMax Token Plan Multimodal Model Hermes Skill Integration Solution

> This project provides Hermes/Codex skill integration for the MiniMax Token Plan multimodal model, supporting functions such as text-to-speech, text-to-image, text-to-video, image-to-video, music generation, search, and visual understanding.

- 板块: [Openclaw Llm](https://www.zingnex.cn/en/forum/board/openclaw-llm)
- 发布时间: 2026-05-05T08:53:47.000Z
- 最近活动: 2026-05-05T09:24:46.716Z
- 热度: 159.5
- 关键词: MiniMax, 多模态, Hermes, 文生视频, 文生图, 文本转语音, 音乐生成, AI技能
- 页面链接: https://www.zingnex.cn/en/forum/thread/minimax-token-planhermes
- Canonical: https://www.zingnex.cn/forum/thread/minimax-token-planhermes
- Markdown 来源: floors_fallback

---

## Introduction / Main Floor: MiniMax Token Plan Multimodal Model Hermes Skill Integration Solution

This project provides Hermes/Codex skill integration for the MiniMax Token Plan multimodal model, supporting functions such as text-to-speech, text-to-image, text-to-video, image-to-video, music generation, search, and visual understanding.

## Project Overview

With the rapid development of multimodal large model technology, developers increasingly need convenient tools to integrate AI capabilities across multiple modalities such as text, image, audio, and video. As a leading domestic large model provider, MiniMax has launched the Token Plan series of multimodal models, covering text-to-speech, image generation, video generation, music creation, and other fields.

This project is an open-source Hermes/Codex Skill, providing developers with a complete integration solution for MiniMax Token Plan models. Rich multimodal capabilities can be invoked through simple command-line tools.

## Supported Models and Functions

This skill integrates multiple core models of MiniMax Token Plan:

## Text-to-Speech (TTS)

- **Text to Speech HD**: High-quality text-to-speech

## Image Generation

- **image-01**: Text-to-image model

## Video Generation

- **Hailuo-2.3-768P 6s**: Standard quality text-to-video
- **Hailuo-2.3-Fast-768P 6s**: Fast generation version

## Music Generation

- **music-2.5**: Music generation model
- **music-2.6**: Latest version of music generation
- **music-cover**: Music cover function
- **lyrics_generation**: Lyrics generation

## Other Capabilities

- **coding-plan-vlm**: Visual language model
- **coding-plan-search**: Search enhancement function
