Enabling efficient and future-proof HPC applications: High-level component-based programming frameworks for heterogeneous parallel systems

Recent disruptive changes in computer hardware (in particular, the transition to multi-/manycore and heterogeneous architectures) have led to a crisis on the application software side: efficient programming for modern parallel and heterogeneous systems has become more tedious, error-prone and hardware specific than ever. In particular, this holds for GPU-based systems that are increasingly popular in high performance computing, with GPU architecture evolving quickly - but which application writer has the time to rewrite and/or re-optimize his/her code for each new hardware generation?

SeRC researcher Christoph Kessler and his group at Linköping University investigate techniques for the design and implementation of component-based programming frameworks for GPU-based systems. Programs are structured into components. A component models a computation with an interface that describes how this computation can be used/invoked, and it encapsulates, usually, multiple implementations that model different ways to carry out that computation on same or different types of execution units, such as CPU cores or GPUs. In the last years, Kessler’s group designed and implemented three major prototype frameworks to demonstrate how to such components can suitably be specified, and how automated implementation selection and memory management mechanisms can lead to better programmability, portability and performance. In particular, Kessler’s group developed the C++ based skeleton programming library SkePU. Skeletons are pre-defined generic components for common computation and communication patterns such as map, reduce, scan, farm etc. that can be parameterized in sequential application-specific code and used as application building blocks along with STL-like container data structures representing vectors, matrices etc. The SkePU programmer interface remains completely sequential - all architecture-specific features such as parallelism, memory management, communication, synchronization, and heterogeneity are encapsulated within skeletons and containers. Static and run-time optimizations of the program execution flow (e.g., implementation selection, parametric autotuning, data transfer optimization) are performed automatically. SkePU programs offer a high level of abstraction and are highly portable and efficient; performance portability has been demonstrated across various GPU-based systems. SkePU is maintained as a long-term activity within SeRC and available as open-source software at http://www.ida.liu.se/~chrke/skepu


• Johan Enmyren, Christoph Kessler:
SkePU: A Multi-Backend Skeleton Programming Library for Multi-GPU Systems.
Proc. 4th Int. Workshop on High-Level Parallel Programming and
Applications (HLPP-2010), Baltimore, USA, Sep. 2010. ACM.

• Usman Dastgeer, Lu Li, Christoph Kessler:
Adaptive implementation selection in a skeleton programming library.
Proc. of the 2013 Biennial Conference on Advanced Parallel Processing
Technology (APPT-2013), Stockholm, Sweden, Aug. 2013. Springer LNCS
8299, pp. 170-183, 2013.

• Usman Dastgeer, Lu Li, Christoph Kessler:
The PEPPHER Composition Tool: Performance-Aware Composition for
GPU-based Systems.
Computing journal, Springer, ISSN 0010-485X (print) / 1436-5057
(online), Nov. 2013 (DOI 10.1007/s00607-013-0371-8).

• Usman Dastgeer, Christoph Kessler:
Performance-aware Composition Framework for GPU-based Systems.
The Journal of Supercomputing, Springer, Jan. 2014 (online); print
version to appear (2014).

• Usman Dastgeer:
Performance-aware Component Composition for GPU-based systems.
PhD thesis, Linköping Studies in Science and Technology No. 1581,
Linköping University, May 2014.