Tencent Releases HY-Motion 1.0: A Billion-Parameter Text-to-Motion Model
These articles are AI-generated summaries. Please check the original sources for full details.
Billion-Parameter Text-to-Motion with HY-Motion 1.0
Tencent Hunyuan has released HY-Motion 1.0, an open-weight text-to-3D human motion model with 1 billion parameters. Built on the Diffusion Transformer (DiT) architecture and Flow Matching, HY-Motion 1.0 generates 3D human motion clips on an SMPL-H skeleton from natural language prompts and specified durations.
Why This Matters
Current text-to-motion systems often struggle with generating realistic and semantically accurate movements, especially for complex activities or longer sequences. Existing models frequently produce unnatural poses, jittering, or fail to adhere to the specified textual instructions. The resulting limitations hinder adoption in applications like game development and virtual avatars, where believable animation is critical - often requiring extensive manual correction, driving up production costs.
Key Insights
- 78.6% SSAE Score: HY-Motion 1.0 achieves a Structural Similarity and Animation Evaluation (SSAE) score of 78.6%, outperforming baseline models like DART and MoMask by over 20 percentage points.
- Diffusion Transformers for Motion: The model leverages the power of DiT architecture, specifically adapted for motion data, offering advantages in sequence modeling and attention mechanisms.
- Flow Matching for Stable Training: Utilizing Flow Matching, rather than traditional denoising diffusion, results in more stable training and better performance with long sequences.
Working Example
# Inference script example (simplified)
import torch
from hy_motion import HYMotion
# Load the model
model = HYMotion(size="1B") # or "Lite" for the smaller model
# Define the prompt and duration
prompt = "a person walking slowly"
duration = 5 #seconds
# Generate the motion
motion = model.generate(prompt, duration)
# Save the motion data (e.g., as a .bvh file)
motion.save("walking.bvh")
Practical Applications
- Game Development: Automate the creation of character animations based on narrative scripts.
- Virtual Reality/Metaverse: Enable more realistic and responsive avatars for immersive experiences.
- Pitfall: Relying on synthetic prompting data without sufficient domain-specific fine-tuning can result in models that produce unrealistic or unnatural motions.
References:
Continue reading
Next article
Trust Wallet Hack: $8.5M Drained via Shai-Hulud Supply Chain Attack
Related Content
MBZUAI Researchers Introduce PAN: A General World Model For Interactable Long Horizon Simulation
MBZUAI’s PAN world model achieves 70.3% agent simulation accuracy, enabling interactive long-horizon video generation.
NVIDIA Releases PersonaPlex-7B-v1: A Real-Time Speech-to-Speech Model
NVIDIA’s PersonaPlex-7B-v1 achieves a 0.908 Takeover Rate on FullDuplexBench, demonstrating significant progress in natural, full-duplex conversational AI.
Moonshot AI Introduces Kimi K2 Thinking: A Breakthrough in Long-Horizon Reasoning and Tool Use
Moonshot AI releases Kimi K2 Thinking, an open-source thinking model capable of executing 200–300 sequential tool calls without human intervention, optimized for long-horizon reasoning and agentic tasks.