AMD PACE - High-Performance Platform Aware Compute Engine
…Redundant transposes and KV head expansions (e.g., repeats) are eliminated. GQA is handled natively in the attention backend, avoiding the costly expand-reshape-copy pattern that dominates CPU decode in standard…