Scaling Biomolecular Modeling Using Context Parallelism in NVIDIA BioNeMo | NVIDIA Technical Blog
…The NVIDIA team implemented halo-exchange-based distributed primitives to partition the atom features, so that subsequent window-batch attention requires no inter-GPU communication. Context parallelism implementation for triangle multiplication The…