Rice Computer Science-Colloquia
Rice University
Department of Computer Science
presents
Lawrence Rauchwerger
Assistant Professor of Computer Science
Texas A&M University
Run-Time Parallelization: A Framework for Parallel Computation
Abstract
The goal of parallelizing, or restructuring, compilers is to detect
and exploit parallelism in sequential programs written in conventional
languages. Current parallelizing compilers do a reasonable job of extracting
parallelism from programs with regular, well behaved, statically analyzable
access patterns. However, if the memory access pattern of the program is
input data dependent, then static data dependence analysis is impossible
and consequently parallelization cannot be performed at compile-time.
Moreover, in this case the compiler cannot apply privatization and reduction
parallelization, the transformations that have proven to be the most
effective in removing data dependences and increasing the amount of
exploitable parallelization in the program. Typical examples of irregular,
dynamic applications are complex simulations such as SPICE for circuit
simulation, DYNA-3D for structural mechanics modeling, DMOL for quantum
mechanical simulation of molecules, and CHARMM for molecular dynamics
simulation of organic systems.
Therefore, since irregular programs represent a large and important
fraction of applications, we advocate the development of an automatable
framework for run-time parallelization. Since fully parallel loops arise
frequently in practice and their parallelism can be exploited in the most
efficient and scalable manner, as a component of this framework we propose
a novel strategy for their identification: speculatively execute the loop
as a doall, and then apply a fully parallel data dependence test to determine
if it had any cross-iteration dependences; if the test fails, then the loop
is re-executed serially. To increase the exploitable parallelism in the loop,
our methods can speculatively apply the privatization and reduction
parallelization transformations and check their validity at run-time.
We present experimental results on loops from the PERFECT Benchmarks which
show that these techniques can indeed yield significant speedups.
Monday, May 12 @ 1:30 p.m. in Duncan Hall 1064
Reception to follow in Duncan Hall 3092