Covering all aspects of the polyhedral model (as this field is sometimes called) and exposing all results and available tools is impossible in one week. Thus, this first school on polyhedral code analysis and optimizations will cover only a selected number of subjects. Other topics may be covered in the future if this event is the first of a series.
The school is organized from Monday 13 to Friday 17 (both lunches included), with 7 half-days of courses and 1 half-day of break (Wednesday afternoon). Each working half-day consists in a 3h course on a separate topic. The first afternoon gives an introduction to the school (a general presentation of the week, an “historical talk”, and some mathematical foundations). The other half-days cover:
- scheduling and code transformations
- algorithms and tools for integer sets and relations
- abstract interpretation
- compilation for distributed platforms with communication optimizations
- array region analysis
- vectorization and SIMD optimizations
Some additional talks may be proposed, after dinner, for the braves, as complements or to cover more focused topics. The schedule of courses is subject to changes but should be the following.
|Mon. 13th||Tue. 14th||Wed. 15th||Thu. 16th||Fri. 17th|
Goals of the school, quick view of topics addressed in the past by polyhedral techniques, short presentation of those covered during the school, organization of courses and additional talks or panels, “social events”.
- The polyhedral model: past, present, future (Sanjay Rajopadhye, Colorado State University).
Those who do not study history are doomed to repeat it. In this talk, I therefore present the history of the polyhedral model, with the view that it is more than just a model for compiling loops to multi- and many-core processors. I also talk about the challenges that are coming up in exascale and beyond, and some opinions on how to adapt to them.
The elaboration of the polyhedral model resulted from a paradigm shift in program analysis: instead of reasoning at the level of statements, this model deals with instances, i.e., execution of statements. The resulting increased precision is paid for by having to deal with very large or even infinite sets. This is only possible in extension, as sets of solutions of systems of constraints. These constraints are usually taken as affine inequalities, both because they fit well with the structure of for loops, and because they are the subject of a rich mathematical theory. Hence instance sets, dependences, memory footprints, schedules are all modeled as Z-polyhedra: sets of points with integer coordinates inside polyhedra.
This talk reviews some of the algorithms for the manipulation of polyhedra, lattices and Z-polyhedra, including the emptiness test, minimization and maximization, scanning, counting, their complexity and possible approximations. It then discusses some recent advances in the manipulation of constraints systems: SMT solvers, quantifier elimination and polynomial solvers.
Automatic and semi-automatic program optimization by means of loop transformations is a critical step in the quest of better program performance. Be it for multi-core CPUs, GPUs or FPGAs, complex sequences of loop transformations are often required to expose parallelism (both coarse-grain and fine-grain) and improve data locality (for instance via loop tiling). The ability to encompass a large set of program transformations into a single, well understood optimization problem is one of the strongest asset of the polyhedral compilation model. In the polyhedral model, sequence of loop transformations are represented with a schedule of each dynamic instance of individual statements of a loop nest.
This course covers how traditional textbook loop transformations can be represented in the polyhedral model, and how semi-automated transformation techniques can be implemented using partially specified schedules. The algorithmic foundations of automatic scheduling in the polyhedral model using (Parametric) Integer Linear Programs are presented, building up from Feautrier’s original results on effective scheduling in the polyhedral model, to the latest results in building and optimizing on the set of all semantics-preserving affine schedules. We also cover some of the most popular polyhedral scheduling algorithms designed in the past two decades.
In the first part of this course, we explain how to model polyhedral analysis tasks using a higher level of abstraction in terms of sets and relations of named integer tuples. In particular, we show how to describe constraints of these sets and relations in first-order logic, with special attention to Presburger arithmetic. We present an overview of the available operations, the tools that support them, and examples of how to use these operations to perform polyhedral code analysis.
In the second part, we delve into the internal representation and explain how some of the available operations are implemented. This set of operations includes low-level set operations and weighted counting. In these descriptions, we mainly focus on algorithms implemented in the libraries isl and barvinok.
Abstract interpretation is a general theory of the approximation of semantics that eases the systematic development of static analyzers that are approximate, sound by construction, and achieve a predefined cost versus precision trade-off. This course presents an introduction to abstract interpretation and its application to the automatic inference of numeric invariants of programs. A static analysis by abstract interpretation is parametrized by abstract domains, which correspond to a choice of properties of interest (semantic choice) together with effective data-structures and algorithms to manipulate them in a computer (algorithmic choice).
After presenting the generic construction of a static analyzer, we present several numeric abstract domains, including the classic polyhedra domain as well as more recent proposals that achieve different cost/precision trade-offs: restrictions, such as octagons and template polyhedra, and extensions, such as interval polyhedra. We discuss in details the operations that are specific to abstract interpretation (such as the widening) and distinguish polyhedral abstract domains from other polyhedral methods. If time permits, we will discuss the challenges in applying these abstract domains to the semantics of the actual numeric data-types employed by programs, such as machine integers and floating-point numbers. We may alternatively (or additionally) discuss our experience building Astrée, an industrial static analyzer that checks for the absence of run-time error in critical embedded C software.
This course covers the problem of compilation and automatic parallelization for heterogeneous architectures with distributed memory. In particular, the problem of automatically generating data movement and communication code, and performing communication optimizations in the context of multi-GPU systems and distributed-memory clusters is presented. A tool demonstration is also included inline for illustration.
Connected to this topic, memory optimization in the polyhedral framework is also presented. We give a survey of works on memory optimization (aka storage optimization) using the polyhedral framework. Memory optimization is key in a variety of domains such as embedded computing, data flow language compilation, compilation for accelerators such as FPGAs, and domain-specific language compilation. We describe the strengths and limitations of the state-of-the-art, and highlight the need for more powerful frameworks and tools.
Compiling sequential programs for parallel homogeneous or heterogeneous machines, with either shared or distributed memory, requires a knowledge about memory accesses and data flows to support parallelization and/or generation of communications. Scientific computation-intensive applications have long been choice candidates for studying this field, in particular because of their usage of the simple data structures provided by the Fortran language, namely arrays.
Many optimization approaches, in particular in the polyhedral realm, are based on a fine study of relationships between single memory scalar or array references. They aim at supporting optimal program transformations, but they enforce restrictions on the input code, notably the absence of function calls. On the contrary, array element sets analyses rely on summarization and approximation techniques to collect information on a coarser grain scale, thus allowing function calls or less structured code, at the expense of precision. However, these techniques have proven their efficiency on real life applications, in particular array region analyses, which are based on convex polyhedra. They are implemented in the PIPS parallelizing compiler, and extensively used in Par4All, a source-to-source platform targeting various architectural models.
In this presentation, we first set the theoretical bases for array element sets analyses. We particularly emphasize the problems due to approximations when using convex sets. We then detail the computations of Read, Write, In, and Out sets for the main syntactic constructs of the Fortran Language, highlighting the re-use of other analyses results, and the elementary operations on convex polyhedra. We then show how array region analyses can be extended to handle other data structures such as those provided by the C language, and how pointers can be taken into account. In a second part, we detail several applications, such as loop parallelization, transformations aiming at enhancing data locality or at reducing memory footprint, generation of data transfers, or even software verification.
All processors today use some flavor of SIMD (Single Instruction Multiple Data) hardware parallelism and recent trends show that the degree of SIMD parallelism is increasing. Therefore, the generation of code suitable for effective utilization of SIMD parallelism is critical for sustainable single chip performance. This session focuses on the use of the polyhedral model to automatically transform affine computations for effective utilization of SIMD parallelism.
First, we summarize the essential characteristics and important factors of performance for two broad classes of SIMD architectures: wide-vector GPUs and short-vector SIMD ISAs. We review the state-of-the-art in vectorization in production compilers like gcc and identify opportunities for improved vectorization through polyhedral transformations. Then, we describe how polyhedral scheduling can be augmented to extract vectorizable loops. We continue with a presentation of polyhedral transformations that can be developed to generate effective code for GPUs as well as for short-vector SIMD architectures. Finally, we present a recently-developed approach to vectorization that uses polyhedral transformations to identify small tiles, called vectorizable codelets, with specific properties that enable coupling with a backend non-polyhedral codelet optimizer. We conclude with performance results attained automatically with the vectorizable codelets approach.