Comp 515—Advanced Compilation for High Performance Computers

Syllabus

Prof. Ken Kennedy

Spring 2006

  1.  Introduction

Parallel and vector architectures. The problem of parallel programming. Bernstein's conditions and the role of dependence. Compilation for parallel machines and automatic detection of parallelism.

  1.  Dependence Theory

Fundamentals of dependence, dependence in loops, types of dependences. Direction vectors and distance vectors. Fundamental theorem of parallelization.

  1.  Dependence Testing

Testing for dependence: separable, gcd and Banerjee tests. Exact dependence testing. Construction of direction and distance vectors.

  1.  Preliminary Transformations

Loop normalization, scalar data flow analysis, induction variable substitution, scalar renaming.

  1.  Fine-Grain Parallel Code Generation

Loop distribution and its safety. The Kuck vectorization principle. The layered vector code-generation algorithm and its complexity. Loop interchange.

  1.  Coarse-Grain Parallel Code Generation

Loop Interchange. Loop Skewing. Scalar and array expansion. Forward substitution. Alignment. Code replication. Array renaming. Node splitting. Pattern recognition. Threshold analysis. Symbolic dependence tests. Parallel code generation and its problems.

  1.  Control Dependence

Types of branches. If conversion. Control dependence. Program dependence graph.

  1.  Memory Hierarchy Management

The use of dependence in scalar register allocation and management of the cache memory hierarchy. Topics include scalar replacement, unroll-and-jam, loop alignment, cache blocking, and prefetching.

  1.  Scheduling for Superscalar and Parallel Machines Machines

Role of dependence. List Scheduling. Software Pipelining. Work scheduling for parallel systems. Guided Self-Scheduling

  1.  Interprocedural Analysis and Optimization

Side effect analysis, constant propagation and alias analysis. Flow-insensitive and flow-sensitive problems. Side effects to arrays. Inline substitution, linkage tailoring and procedure cloning. Management of interprocedural analysis and optimization.

  1.  Compilation of Other Languages.

C, Verilog, Fortran 90, HPF.

Text

Allen and Kennedy, Optimizing Compilers for Modern Architectures, Morgan-Kaufmann, Second Printing, 2005.

References

Banerjee, Dependence Analysis, Kluwer Academic Publishers.

Wolfe, High Performance Compilers for Parallel Computing, Addison-Wesley, 1996.

Wolfe, Optimizing Supercompilers for Supercomputers, MIT Press.

Zima and Chapman, Supercompilers for Parallel and Vector Computers, ACM Press.

Course Requirements

Class Presentations (50%).  Each student will be required to present at least one course topic and participate in the discussions of presentations by others.

Oral Examination (50%).  There will be a final oral exam.