Description:
Large-scale GPU clusters are becoming the standard in our HPC scientific computing community. The complexity of these architectures, sporting many CPUs sockets and GPU devices in the same node, may represent an hard task to tame.
The Advanced School on HPC Computing with GPU Accelerators provides a comprehensive training path for developers who want to take their scientific applications to the next level.
The course will start with an architectural overview of modern GPU based heterogeneous clusters, focusing on their components, computing units, memory interconnections and data movement needs. We will teach you how to profile your applications, identify bottlenecks and select or optimize computational intensive sections to run on GPU accelerators. We will explain how to exploit concurrent execution on both CPUs and GPUs while optimizing data transfers and communications.
The course will cover both a high level (pragma-based) programming approach for a fast-porting startup, and a lower-level (language instructions) approach for finer grained computationally intensive tasks. A special attention will be given on performance tuning and techniques to overcome common data movement bottlenecks and access patterns.
Skills:
By the end of the course, students will be able to:
- understand the strengths and weaknesses of GPUs as accelerators
- program GPU accelerated applications using both higher and lower level programming approaches
- profile your application, identify bottlenecks, make a porting plan, refine and improve
- make best use of independent execution queues for concurrent computing/data-movement operations
Target audience:
Researchers and programmers interested in porting scientific applications or use efficient post-process and data-analysis techniques in modern heterogeneous HPC architectures.
Pre-requisites:
A basic knowledge of C or Fortran is mandatory. Developer environment will be on Linux systems. A basic knowledge of any parallel programming technique/paradigm is recommended.