Cut Checkpoint Costs with About 30 Lines of Python and NVIDIA nvCOMP | NVIDIA Technical Blog
…Integration: ~30 Lines of Python The integration effort is minimal. Here’s a drop-in replacement for torch.save / torch.load : # CUDA 13.x (Blackwell): pip install nvidia-nvcomp-cu13 cupy-cuda13x…