Rice Computer Science-Colloquia
Rice University
Department of Computer Science
presents

Lawrence Rauchwerger

Assistant Professor of Computer Science
Texas A&M University

Run-Time Parallelization: A Framework for Parallel Computation

Abstract

The goal of parallelizing, or restructuring, compilers is to detect and exploit parallelism in sequential programs written in conventional languages. Current parallelizing compilers do a reasonable job of extracting parallelism from programs with regular, well behaved, statically analyzable access patterns. However, if the memory access pattern of the program is input data dependent, then static data dependence analysis is impossible and consequently parallelization cannot be performed at compile-time. Moreover, in this case the compiler cannot apply privatization and reduction parallelization, the transformations that have proven to be the most effective in removing data dependences and increasing the amount of exploitable parallelization in the program. Typical examples of irregular, dynamic applications are complex simulations such as SPICE for circuit simulation, DYNA-3D for structural mechanics modeling, DMOL for quantum mechanical simulation of molecules, and CHARMM for molecular dynamics simulation of organic systems.

Therefore, since irregular programs represent a large and important fraction of applications, we advocate the development of an automatable framework for run-time parallelization. Since fully parallel loops arise frequently in practice and their parallelism can be exploited in the most efficient and scalable manner, as a component of this framework we propose a novel strategy for their identification: speculatively execute the loop as a doall, and then apply a fully parallel data dependence test to determine if it had any cross-iteration dependences; if the test fails, then the loop is re-executed serially. To increase the exploitable parallelism in the loop, our methods can speculatively apply the privatization and reduction parallelization transformations and check their validity at run-time. We present experimental results on loops from the PERFECT Benchmarks which show that these techniques can indeed yield significant speedups.

Monday, May 12 @ 1:30 p.m. in Duncan Hall 1064
Reception to follow in Duncan Hall 3092