Accelerating GPT-OSS-20B on AMD Ryzen™ AI NPUs: Efficient MoE Inference on Strix and Halo
…To address this, we implement a hybrid QMoE execution strategy optimized for Ryzen™ AI. Top-K routing is performed on the CPU, where the gating network selects the most relevant experts for…