Accelerating GPT-OSS-20B on AMD Ryzen™ AI NPUs: Efficient MoE Inference on Strix and Halo
…The original model is trained using MXFP4 numeric format. For deployment on AMD Ryzen™ AI platforms, we use an INT4-quantized ONNX version of the model. INT4 is natively supported by the…