This course gives an overview of the most relevant programming paradigms and techniques to accelerate computationally demanding tasks on HPC heterogeneous architectures based on GPUs.
The course will start with an architectural overview of modern GPU based heterogeneous architectures, focusing on its computing power versus data movement needs. The course will cover both a high level (pragma-based) programming approach with OpenACC for a fast porting startup, and lower level approaches based on nVIDIA CUDA programming language for finer grained computational intensive tasks. Other approaches such as OpenMP offload, OpenCL, AMD HIP will be covered as a comparison to the others. A particular attention will be given on performance tuning and techniques to overcome common data movement bottlenecks and access patterns. Examples and exercises will be provided using both C and FORTRAN programming languages.
By the end of the course, students will be able to:
understand the strengths and weaknesses of GPUs as accelerators
program GPU accelerated applications using both higher and lower level programming approaches
overcome problems and bottlenecks regarding data movement between host and device memories
make best use of independent execution queues for concurrent computing/data-movement operations
Researchers and programmers interested in porting scientific applications or use efficient post-process and data-analysis techniques in modern heterogeneous HPC architectures.
A basic knowledge of C or Fortran is mandatory. Programming and Linux or Unix. A basic knowledge of any parallel programming technique/paradigm is recommended.