Reading

KV-Control: Parameter-Efficient K/V Injection for Trajectory-Controlled Text-to-Motion

KV-Control injects geometric constraints as memory into the key-value pairs of self-attention layers, enabling precise trajectory control (achieving sub-centimeter precision) without modifying the main body of pre-trained text-to-motion models. It provides a lightweight control interface for animation and embodied intelligence applications.

text-to-motiontrajectory controlKV injectionattention mechanismPartVQparameter-efficient3D human motiontransformer adapter

Published 2026-06-04 10:50Recent activity 2026-06-05 19:54Estimated read 8 min

KV-Control: Parameter-Efficient K/V Injection for Trajectory-Controlled Text-to-Motion

Section 01

KV-Control: A Lightweight Trajectory Control Method for Text-to-Motion

KV-Control is a parameter-efficient method for trajectory-controlled text-to-motion generation. It injects geometric constraints as memory into the key-value pairs of self-attention layers, enabling precise trajectory control (sub-cm precision) without modifying the main body of pre-trained text-to-motion models. This provides a lightweight control interface for applications like animation and embodied intelligence.

Source Info:

Original authors: arXiv author team
Source platform: arXiv
Original title: KV-Control: Parameter-Efficient K/V Injection for Trajectory-Controlled Text-to-Motion
Link: http://arxiv.org/abs/2606.05624v1
Release time: 2026-06-04

Section 02

Background: Control Dilemma in Text-to-Motion Generation

Text-driven 3D human motion generation models can synthesize reasonable actions from descriptive prompts, but real-world applications require precise trajectory control (e.g., root path, end effector targets) while preserving text-based action quality. This creates a trade-off between precision (meeting geometric constraints) and preservation (retaining pre-trained text-conditioned action knowledge).

Existing solutions have limitations:

Large-scale modification schemes: Copy generator structures for layer control access, leading to parameter redundancy and high training costs.
Test-time optimization schemes: Shift computation to inference, sacrificing real-time efficiency.

Section 03

Method: Core Mechanism & Supporting Designs of KV-Control

KV-Control is a compact attention-side control interface for frozen text-to-motion Transformers. Its core innovation is injecting geometric constraints as 'memory' into self-attention layers instead of using global tokens or output constraints.

Key Components:

KV Injection: Inject control conditions into key/value pairs of each self-attention layer, keeping pre-trained query streams, text cross-attention, FFNs, and main network weights frozen.
PartVQ: Anatomically aligned part codebook that decomposes actions into semantic body parts for fine-grained control, interpretability, and compression.
T-Concat: Exposes frame-part tokens as attention-addressable sites for precise control over specific time steps and body parts.

Parameter Efficiency:

Only shared trajectory encoders and lightweight KV injection adapters are trainable, minimizing training overhead.

Section 04

Performance: Balancing Precision & Text Condition Quality

KV-Control achieves a balance between trajectory precision and text-conditioned action quality:

Trajectory Tracking Precision:

Root trajectory tracking: Sub-cm level accuracy.
Multi-joint constraints: Meets multiple joint trajectory requirements.
Time consistency: Maintains temporal coherence of actions.

Text Condition Quality Preservation:

Semantic consistency with text descriptions.
Retains high-level features like gait and style.
Preserves naturalness and fluency of actions.

Section 05

Application Scenarios of KV-Control

KV-Control's lightweight and precise control makes it suitable for:

Animation Production: Adjust specific details (e.g., character path, hand position) without re-generating the entire action.
Embodied Intelligence & Robotics: Apply to obstacle avoidance, precise end effector operations, and multi-constraint task execution.
Game Development: Enable character movement along specific paths, precise interaction with environment objects, and style-consistent actions for level design.

Section 06

Limitations & Future Research Directions

Current Limitations:

Focuses only on geometric trajectory constraints; other constraints (physical, social) need exploration.
Generalization to unseen action types requires further verification.
Extending to multi-agent interaction scenarios is a challenge.

Future Directions:

Explore other types of constraints (physical, social).
Improve generalization to unseen actions.
Extend to multi-agent scenarios.
Apply the KV injection idea to other generative tasks (e.g., image layout control, speech prosody control).

Section 07

Conclusion: Value & Potential of KV-Control

KV-Control redefines trajectory control as a lightweight memory retrieval problem, providing a small, precise, and transparent control interface for text-to-motion generation. Its 'frozen main network + lightweight adapter' paradigm balances precise control with pre-trained model capabilities, offering a general solution for generative model control. As embodied intelligence and virtual character applications grow, this technology will become increasingly important for flexible switching between semantic description and precise control.

Continue Reading

Keep going with more reads from the same topic.

Nornir MCP Server: An Enterprise-Grade Bridge for Integrating Large Language Models into Network Automation

Nornir MCP Server is an enterprise-level server based on the Model Context Protocol (MCP). It seamlessly integrates large language models (such as Claude) with the Nornir network automation framework, supporting natural language orchestration for multi-vendor network devices (Cisco, Arista, Juniper, etc.), and providing production-grade features like a dual-engine architecture (NAPALM + Netmiko), intelligent filtering, and a secure sandbox.

Recent activity 2026-05-06 20:51

Bibliothèque Française LLM: A French Public Domain Literature Index System Optimized for Large Language Models

Bibliothèque Française LLM is a structured indexing and annotation project for French public domain literature designed specifically for large language models (LLMs). It integrates multiple authoritative sources such as DraCor, Common Corpus, and Wikisource, providing metadata indexing categorized by genre, author, and era, as well as in-depth annotations for dramatic texts (including characters, lines, stage directions, etc.). Its aim is to enable LLMs to efficiently read and understand classic French literary works.

Recent activity 2026-05-06 20:50

Splinter: A Lock-Free Zero-Copy Shared Memory KV and Vector Storage Library That Eliminates Socket and Memcpy Overhead for LLM Inference

Splinter is a minimalist, high-performance key-value (KV) and vector storage system enabling zero-latency inter-process communication via shared memory and atomic operations. With only 766 lines of core code, it supports millions of operations per second and 768-dimensional vector storage, offering a new architectural approach for local LLM inference and data-intensive applications.

Recent activity 2026-04-03 08:49

Building an AWS Generative AI Application from Scratch: EC2 + Bedrock Hands-On Tutorial

A complete cloud-native AI application development guide for beginners, building a simple generative AI chatbot using Amazon EC2, Apache, Python CGI, and Amazon Bedrock, covering architecture design, IAM permission configuration, security best practices, and cost optimization suggestions.

Recent activity 2026-06-02 19:49