Overview of the dHPF Compiler Project
For high-level, data-parallel languages to achieve wide acceptance, it will
be essential to have parallelizing compilers that provide
consistently high performance for a broad spectrum of scientific
applications on a variety of parallel architectures.
The dHPF project aims to address this need by pursuing three goals:
- Develop compilation techniques that narrow the gap between the
performance of data-parallel languages and hand-coded programs.
- Develop optimization techniques for a range of parallel architecture
classes including message-passing, shared-memory, and cluster systems.
- Develop compiler technology to support new language features that
broaden the applicability of data-parallel languages.
Recent Results and Current Directions
A computation partitioning framework and associated optimizations for
In data-parallel compilers, aggressive static computation partitioning
can be a key determinant of performance. The Rice dHPF compiler is
based on a computation partitioning (CP) framework that supports a
more general class of partitionings than previous compilers.
The framework enables a number of aggressive computation
- maximizing parallelism in the presence of arbitrary control flow;
- computation partition selection for computations that use
- integrated CP selection and commmunication-sensitive
- interprocedural selection of computation partitions; and
- an innovative approach to reduce the frequency of communication
across time-steps of iterative stencil codes.
These optimizations have led to some exciting results. For example,
a common problem with current compilers for High Performance Fortran
(HPF) is that substantial restructuring and hand-optimization
may be required to obtain acceptable performance from an
HPF port of an existing Fortran application.
For the NAS SP and BT application benchmarks compiled with dHPF,
however, we achieve performance within 15% of hand-written MPI
code on 25 processors for BT and within 33% for SP.
Furthermore, these results are obtained with HPF versions of
the benchmarks that were created with minimal restructuring
of the serial code (modifying only approximately 5% of the code).
Many of the optimizations above were crucial in obtaining this
This research is described in the following papers:
An SC98 paper uses the NAS benchmarks SP and BT to motivate
and describe the optimizations required to achieve high performance in
modern scientific codes with minimal restructuring.
A new paper describes the Computation Partitioning framework
in dHPF and the CP selection algorithms enabled by this framework, as
described above. The paper also evaluates the impact of the
individual optimizations on the NAS application benchmarks and the
SPEC benchmark Tomcatv
An integer set framework for data-parallel program optimization
Language and compiler support for out-of-core computations
As part of the Scalable I/O Consortium, we are developing language and
compiler support for high-performance computation using very large data sets,
commonly referred to as "out-of-core".
Our approach extends the HPF data-distribution mechanism
to support partitioning data between memory and disks.
This guides the compiler in optimizing parallel I/O operations
and shields the application programmer
from architectural and file-system details.
introducing out-of-core directives and comparing application
performance using explicit I/O and virtual memory.
- A joint
conference paper with researchers at Syracuse University
describing models for out-of-core compilation
and I/O run-time systems.
Optimizations for SMP and DSM systems
We are experimenting with strategies for achieving high performance
with HPF on shared memory systems, including data layout choices,
distributed memory optimization techniques, memory hierarchy
management, prefetching, and dynamic computation partitioning.
If you have questions or comments about the D System or this repository,
please contact firstname.lastname@example.org.
Last updated on 24 July 1997.