C++ Standard Parallelism for HPC Performance Portability

Date:

Teamed up with Bryce Adelstein Lelbach (NVIDIA) and Philipp Zimmermann (Fraunhofer ITWM) to deliver a half-day, hands-on tutorial at ISC26. This was my second time co-presenting on C++ standard parallelism with Bryce. We walked folks through accelerating HPC workloads across AMD, Intel, and NVIDIA GPUs using standard C++20, emphasizing how to write performance-portable code without relying on proprietary, vendor-specific language extensions. Covered everything from multi-dimensional loops and reductions to overlapping MPI communication with GPU compute, all backed by browser-based cloud GPU exercises.