8 hours of instruction
Explore how to use Numba the just-in-time, type-specializing Python function compiler to accelerate Python programs to run on massively parallel NVIDIA GPUs.
OBJECTIVES
- GPU-accelerate NumPy ufuncs with a few lines of code
- Configure code parallelization using the CUDA thread hierarchy
- Write custom CUDA device kernels for maximum performance and flexibility
- Use memory coalescing and on-device shared memory to increase CUDA kernel bandwidth
PREREQUISITES
None
SYLLABUS & TOPICS COVERED
- Introduction
- Meet the instructor and create an account
- Introduction To CUDA Python
- Begin working with the Numba compiler and CUDA programming in Python
- Use Numba decorators to GPU-accelerate numerical Python functions
- Optimize host-to-device and device-to-host memory transfers
- Custom CUDA Kernels In Python
- Learn CUDA’s parallel thread hierarchy and how to extend parallel program possibilities
- Launch massively parallel, custom CUDA kernels on the GPU
- Utilize CUDA atomic operations to avoid race conditions during parallel execution
- RNG Multidimensional Grids And Shared Memory
- Use xoroshiro128+ RNG to support GPU-accelerated Monte Carlo methods
- Learn multidimensional grid creation and how to work in parallel on 2D matrices
- Leverage on-device shared memory to promote memory coalescing while reshaping 2D matrices
- Final Review
- Review key learnings, wrap up questions and complete the assessment to earn a certificate and take the workshop survey.
SOFTWARE REQUIREMENTS
Each participant will be provided with dedicated access to a fully configured, GPU-accelerated workstation in the cloud.
Login
Accessing this course requires a login. Please enter your credentials below!