Accelerating GPT-OSS-20B on AMD Ryzen™ AI NPUs: Efficient MoE Inference on Strix and Halo
…For deployment on AMD Ryzen™ AI platforms, we use an INT4-quantized ONNX version of the model. INT4 is natively supported by the Ryzen™ AI NPU, enabling higher throughput and improved power…