Tencent Releases HY-Motion 1.0: A Billion-Parameter Text-to-Motion Model

Billion-Parameter Text-to-Motion with HY-Motion 1.0

Tencent Hunyuan has released HY-Motion 1.0, an open-weight text-to-3D human motion model with 1 billion parameters. Built on the Diffusion Transformer (DiT) architecture and Flow Matching, HY-Motion 1.0 generates 3D human motion clips on an SMPL-H skeleton from natural language prompts and specified durations.

Why This Matters

Current text-to-motion systems often struggle with generating realistic and semantically accurate movements, especially for complex activities or longer sequences. Existing models frequently produce unnatural poses, jittering, or fail to adhere to the specified textual instructions. The resulting limitations hinder adoption in applications like game development and virtual avatars, where believable animation is critical - often requiring extensive manual correction, driving up production costs.

Key Insights

78.6% SSAE Score: HY-Motion 1.0 achieves a Structural Similarity and Animation Evaluation (SSAE) score of 78.6%, outperforming baseline models like DART and MoMask by over 20 percentage points.
Diffusion Transformers for Motion: The model leverages the power of DiT architecture, specifically adapted for motion data, offering advantages in sequence modeling and attention mechanisms.
Flow Matching for Stable Training: Utilizing Flow Matching, rather than traditional denoising diffusion, results in more stable training and better performance with long sequences.

Working Example

# Inference script example (simplified)
import torch
from hy_motion import HYMotion

# Load the model
model = HYMotion(size="1B") # or "Lite" for the smaller model

# Define the prompt and duration
prompt = "a person walking slowly"
duration = 5  #seconds

# Generate the motion
motion = model.generate(prompt, duration)

# Save the motion data (e.g., as a .bvh file)
motion.save("walking.bvh")

Practical Applications

Game Development: Automate the creation of character animations based on narrative scripts.
Virtual Reality/Metaverse: Enable more realistic and responsive avatars for immersive experiences.
Pitfall: Relying on synthetic prompting data without sufficient domain-specific fine-tuning can result in models that produce unrealistic or unnatural motions.

References:

https://www.marktechpost.com/2025/12/31/tencent-released-tencent-hy-motion-1-0-a-billion-parameter-text-to-motion-model-built-on-the-diffusion-transformer-dit-architecture-and-flow-matching/

On This Page

Billion-Parameter Text-to-Motion with HY-Motion 1.0

Why This Matters

Key Insights

Working Example

Practical Applications

Continue reading

Related Content

MBZUAI Researchers Introduce PAN: A General World Model For Interactable Long Horizon Simulation

NVIDIA Releases PersonaPlex-7B-v1: A Real-Time Speech-to-Speech Model

Moonshot AI Introduces Kimi K2 Thinking: A Breakthrough in Long-Horizon Reasoning and Tool Use