Paper page - SANA-WM: Efficient Minute-Scale World Modeling with Hybrid Linear Diffusion Transformer
… Driven by these designs, SANA-WMdemonstrates remarkable efficiency across data, training compute, and inference hardware: it uses only sim213K public video clips with metric-scale pose supervision , completes training in 15 days on 64 H100s, and generates each 60s clip on a single GPU; its distille… …