Fundamentals of Accelerated Computing with CUDA Python

Explore how to use Numba the just-in-time, type-specializing Python function compiler to accelerate Python programs to run on massively parallel NVIDIA GPUs.

8 hours of instruction

Explore how to use Numba the just-in-time, type-specializing Python function compiler to accelerate Python programs to run on massively parallel NVIDIA GPUs.

OBJECTIVES

  1. GPU-accelerate NumPy ufuncs with a few lines of code
  2. Configure code parallelization using the CUDA thread hierarchy
  3. Write custom CUDA device kernels for maximum performance and flexibility
  4. Use memory coalescing and on-device shared memory to increase CUDA kernel bandwidth

PREREQUISITES

None

SYLLABUS & TOPICS COVERED

  1. Introduction
    • Meet the instructor and create an account
  2. Introduction To CUDA Python
    • Begin working with the Numba compiler and CUDA programming in Python
    • Use Numba decorators to GPU-accelerate numerical Python functions
    • Optimize host-to-device and device-to-host memory transfers
  3. Custom CUDA Kernels In Python
    • Learn CUDA’s parallel thread hierarchy and how to extend parallel program possibilities
    • Launch massively parallel, custom CUDA kernels on the GPU
    • Utilize CUDA atomic operations to avoid race conditions during parallel execution
  4. RNG Multidimensional Grids And Shared Memory
    • Use xoroshiro128+ RNG to support GPU-accelerated Monte Carlo methods
    • Learn multidimensional grid creation and how to work in parallel on 2D matrices
    • Leverage on-device shared memory to promote memory coalescing while reshaping 2D matrices
  5. Final Review
    • Review key learnings, wrap up questions and complete the assessment to earn a certificate and take the workshop survey.

SOFTWARE REQUIREMENTS

Each participant will be provided with dedicated access to a fully configured, GPU-accelerated workstation in the cloud.

Not Enrolled
This course is currently closed