Paper page - Turning the TIDE: Cross-Architecture Distillation for Diffusion Large Language Models
…TIDAL — a dual-axis distillation schedule over training progress and diffusion timestep, which down-weights unreliable teacher signals in high-mask / high-noise regions. CompDemo — complementary mask demonstrations with dual teacher forward…