💬
Language
"Wave to a friend"
"Practice martial arts"
FSQ Tokenizer
🎵
Music
Audio waveforms
Rhythm patterns
FSQ Tokenizer
📍
Trajectory
Path waypoints
Navigation goals
FSQ Tokenizer
🏃
Motion
Human pose data
Movement sequences
FSQ Tokenizer
Shared Token Pool
Common Code Embedding Space
Language
Music
Trajectory
Motion
Multimodal LLM
Unified Understanding
Input
Output
Causal Decoder
Frame-wise DoF Decoding
Causal Convolution (k=3)
Pad
Input
Output
Real-time Streaming →
Motion Tracker
Humanoid Robot Control
Joint DoF Values (29 dims)
Robot Motion Output