Boost Inference Performance up to 15x on NVIDIA Blackwell Using DFlash Speculative Decoding | NVIDIA Technical Blog
…paper’s release, the research team has released 20 DFlash model checkpoints on Hugging Face with Blackwell and Hopper recipes, covering model families including Qwen, Kimi K2.6, Llama, Gemma, and gpt…
