intel.com › content › www › … C-DAC Achieves 1.75x Performance Improvement … Due to the presence of multiple CUDA streams with async calls, the migrated code needed the placement of appropriate barrier/wait calls or a single SYCL queue to maintain the data consistency. …