Zing Forum

Reading

KV-Control: Parameter-Efficient K/V Injection for Trajectory-Controlled Text-to-Motion

KV-Control injects geometric constraints as memory into the key-value pairs of self-attention layers, enabling precise trajectory control (achieving sub-centimeter precision) without modifying the main body of pre-trained text-to-motion models. It provides a lightweight control interface for animation and embodied intelligence applications.

text-to-motiontrajectory controlKV injectionattention mechanismPartVQparameter-efficient3D human motiontransformer adapter
Published 2026-06-04 10:50Recent activity 2026-06-05 19:54Estimated read 8 min
KV-Control: Parameter-Efficient K/V Injection for Trajectory-Controlled Text-to-Motion
1

Section 01

KV-Control: A Lightweight Trajectory Control Method for Text-to-Motion

KV-Control is a parameter-efficient method for trajectory-controlled text-to-motion generation. It injects geometric constraints as memory into the key-value pairs of self-attention layers, enabling precise trajectory control (sub-cm precision) without modifying the main body of pre-trained text-to-motion models. This provides a lightweight control interface for applications like animation and embodied intelligence.

Source Info:

  • Original authors: arXiv author team
  • Source platform: arXiv
  • Original title: KV-Control: Parameter-Efficient K/V Injection for Trajectory-Controlled Text-to-Motion
  • Link: http://arxiv.org/abs/2606.05624v1
  • Release time: 2026-06-04
2

Section 02

Background: Control Dilemma in Text-to-Motion Generation

Text-driven 3D human motion generation models can synthesize reasonable actions from descriptive prompts, but real-world applications require precise trajectory control (e.g., root path, end effector targets) while preserving text-based action quality. This creates a trade-off between precision (meeting geometric constraints) and preservation (retaining pre-trained text-conditioned action knowledge).

Existing solutions have limitations:

  • Large-scale modification schemes: Copy generator structures for layer control access, leading to parameter redundancy and high training costs.
  • Test-time optimization schemes: Shift computation to inference, sacrificing real-time efficiency.
3

Section 03

Method: Core Mechanism & Supporting Designs of KV-Control

KV-Control is a compact attention-side control interface for frozen text-to-motion Transformers. Its core innovation is injecting geometric constraints as 'memory' into self-attention layers instead of using global tokens or output constraints.

Key Components:

  1. KV Injection: Inject control conditions into key/value pairs of each self-attention layer, keeping pre-trained query streams, text cross-attention, FFNs, and main network weights frozen.
  2. PartVQ: Anatomically aligned part codebook that decomposes actions into semantic body parts for fine-grained control, interpretability, and compression.
  3. T-Concat: Exposes frame-part tokens as attention-addressable sites for precise control over specific time steps and body parts.

Parameter Efficiency:

Only shared trajectory encoders and lightweight KV injection adapters are trainable, minimizing training overhead.

4

Section 04

Performance: Balancing Precision & Text Condition Quality

KV-Control achieves a balance between trajectory precision and text-conditioned action quality:

Trajectory Tracking Precision:

  • Root trajectory tracking: Sub-cm level accuracy.
  • Multi-joint constraints: Meets multiple joint trajectory requirements.
  • Time consistency: Maintains temporal coherence of actions.

Text Condition Quality Preservation:

  • Semantic consistency with text descriptions.
  • Retains high-level features like gait and style.
  • Preserves naturalness and fluency of actions.
5

Section 05

Application Scenarios of KV-Control

KV-Control's lightweight and precise control makes it suitable for:

  1. Animation Production: Adjust specific details (e.g., character path, hand position) without re-generating the entire action.
  2. Embodied Intelligence & Robotics: Apply to obstacle avoidance, precise end effector operations, and multi-constraint task execution.
  3. Game Development: Enable character movement along specific paths, precise interaction with environment objects, and style-consistent actions for level design.
6

Section 06

Limitations & Future Research Directions

Current Limitations:

  • Focuses only on geometric trajectory constraints; other constraints (physical, social) need exploration.
  • Generalization to unseen action types requires further verification.
  • Extending to multi-agent interaction scenarios is a challenge.

Future Directions:

  • Explore other types of constraints (physical, social).
  • Improve generalization to unseen actions.
  • Extend to multi-agent scenarios.
  • Apply the KV injection idea to other generative tasks (e.g., image layout control, speech prosody control).
7

Section 07

Conclusion: Value & Potential of KV-Control

KV-Control redefines trajectory control as a lightweight memory retrieval problem, providing a small, precise, and transparent control interface for text-to-motion generation. Its 'frozen main network + lightweight adapter' paradigm balances precise control with pre-trained model capabilities, offering a general solution for generative model control. As embodied intelligence and virtual character applications grow, this technology will become increasingly important for flexible switching between semantic description and precise control.