AMD Delivers Breakthrough MLPerf Inference 6.0 Results
… On Llama 2 70B, we scaled from one node to 11 nodes and stayed remarkably close to ideal linear scaling. Image Zoom At 11 nodes and 87 AMD Instinct MI355X GPUs, we delivered 1,042,110 tokens per second in Offline, 1,016,380 tokens per second in Server and 785,522 tokens per second in Interactive. …