CUDA 13.2 Introduces Enhanced CUDA Tile Support and New Python Features | NVIDIA Technical Blog
…Fixed-size segmented reduction CCCL 3.2 now provides a new cub::DeviceSegmentedReduce variant that accepts a uniform segment_size , eliminating offset iterator overhead in the common case when segments are fixed…