Fundamentals of Accelerated Computing with CUDA Python

8 hours of instruction

Explore how to use Numba the just-in-time, type-specializing Python function compiler to accelerate Python programs to run on massively parallel NVIDIA GPUs.

OBJECTIVES

GPU-accelerate NumPy ufuncs with a few lines of code
Configure code parallelization using the CUDA thread hierarchy
Write custom CUDA device kernels for maximum performance and flexibility
Use memory coalescing and on-device shared memory to increase CUDA kernel bandwidth

PREREQUISITES

None

SYLLABUS & TOPICS COVERED

Introduction
- Meet the instructor and create an account
Introduction To CUDA Python
- Begin working with the Numba compiler and CUDA programming in Python
- Use Numba decorators to GPU-accelerate numerical Python functions
- Optimize host-to-device and device-to-host memory transfers
Custom CUDA Kernels In Python
- Learn CUDA’s parallel thread hierarchy and how to extend parallel program possibilities
- Launch massively parallel, custom CUDA kernels on the GPU
- Utilize CUDA atomic operations to avoid race conditions during parallel execution
RNG Multidimensional Grids And Shared Memory
- Use xoroshiro128+ RNG to support GPU-accelerated Monte Carlo methods
- Learn multidimensional grid creation and how to work in parallel on 2D matrices
- Leverage on-device shared memory to promote memory coalescing while reshaping 2D matrices
Final Review
- Review key learnings, wrap up questions and complete the assessment to earn a certificate and take the workshop survey.

SOFTWARE REQUIREMENTS

Each participant will be provided with dedicated access to a fully configured, GPU-accelerated workstation in the cloud.

Fundamentals of Accelerated Computing with CUDA Python

Resources

Company