TAO HUANG

THE ONLY WAY TO DO GREAT WORK IS TO LOVE WHAT YOU DO.

Overall Research Framework

以多模态智能为核心，以视觉生成建模世界，以具身智能交互现实
Multimodal Intelligence at the Core, Visual Generation for Modeling the World, Embodied Intelligence for Interacting with Reality

Research Highlights

1 Efficient Foundation Architectures

Kairos-3.0 World Model

A Pure Temporally-Linear DiT Architecture
We replace standard quadratic temporal attention with a hybrid design composed of linear- and local-attention mechanisms. This enables efficient modeling of long video sequences while maintaining strong temporal coherence and causal reasoning ability.

MiniMax-01 LLM

Innovative Linear Attention Architecture
For the first time on a large scale, we have implemented a mixture of linear and softmax attention architecture. This model has 456B parameters and a MoE structure with 45.9B parameters activated per inference.

Mamba for Vision Vision SSM

Highlights coming soon.

2 Knowledge Distillation

DIST Knowledge Distillation

Series of practical and effective KD methods. Highlights and visual summaries coming soon.

More KD Highlights Coming

Additional techniques and benchmarks are being organized.

3 Embodied AI

Embodied World Models Embodied

Learning world models for interaction and long-horizon planning. Highlights coming soon.

VLA & Policy Learning VLA

Vision-Language-Action agents and efficient policy learning for real-world tasks.