Paper page - Orthrus: Memory-Efficient Parallel Token Generation via Dual-View Diffusion
…While diffusion language models attempt to break this barrier via parallel generation, they suffer from significant performance degradation, high training costs, and a lack of rigorous convergence guarantees. Orthrus resolves this dichotomy…