Paper page - SlimQwen: Exploring the Pruning and Distillation in Large MoE Model Pre-training
…gradual architecture transitions lead to better optimization trajectories. Putting it all together, we compress Qwen3-Next-80A3B to a 23A2B model that retains competitive performance. These results offer practical guidance for efficient…