Overview of the dHPF Compiler
Project Leaders:
Ken Kennedy, John Mellor-Crummey and Vikram Adve
Current Participants:
Arun Chauhan,
Chen Ding,
Katherine Fletcher,
Robert Fowler,
Charles Koelbel,
Bo Lu,
Collin McCurdy,
Nat McIntosh,
Monika Mevenkamp,
Dejan Mircevski,
Mike Paleczny,
Ajay Sethi,
Lisa Thomas,
Lei Zhou
Parallel Compiler and Tools Group
Center for Research on Parallel Computation
Rice University
-last updated May 1996 by johnmc@cs.rice.edu
Motivation
HPF compiler technology must be expanded in 3 directions:
Compiler Goals
Fortran D95 Language
Fortran D95 is designed to support research on
data-parallel programming in High Performance Fortran (HPF) and to
explore extensions that would broaden HPF's applicability or enhance
performance.
dHPF Compiler Organization
Front End
Purpose: parsing, semantic checking of HPF directives, and preprocessing code for further analysis
Limitations (May 1996)
Preliminary Communication Placement
Purpose:
provide feedback to the computation partitioner about where
(conservatively) communication might be needed
Strategy
Limitations (May 1996)
Computation Partitioning Selection
Purpose:
a framework to evaluate and select from several computation partitioning
alternatives, not restricted to the owner-computes rule.
Limitations (May 1996)
Communication Refinement
Purpose:
given a computation partition choice, CP, determine and optimize the communication required
[example]
Limitations (May 1996)
Code Generation Overview
Partitioning and communication based on Omega Library for integer set manipulation
Code Generation for Computation Partitioning
Omega Library (University of Maryland)
Omega-Based Framework for Data Parallel Code Generation
Omega-Based Framework: Example
Optimizations using the Omega Framework - I
Optimizations using the Omega Framework - II
Compiled Examples
Features
Front End
Parallelism Preliminary Communication Placement
and Computation Partitioning
Communication
Placement Communication Refinement
Code Generation
using contextual information from enclosing scopes
Local := { [i,j] : 25 * p1 + 1 <= i <= 25 * p1 + 25 &&
(exists k: 4 * k = j - p2 && 1 <= j <= 100) }
RefSubscript := { [i,j] -> [i',j'] : i' = i - 1 && j' = j }
DataAccessedByRef := RefSubscript(LocalIterSet)
for(t1 = 25*mypid1+1; t1 <= 25*mypid1+25; t1++)
for(t2 = 1+(mypid2-(1))%4; t2 <= 100; t2 += 4)
s1(t1,t2)
3 types of sets
Data: a set of data elements
3 types of mappings
Iterations : a set of loop iterations
Processors: a set of processors
Layout: data <-> processors
Reference: iterations <-> data
Comp.Part.: iterations <-> processors
# !HPF$ distribute A(BLOCK) on P(4) ! sic
# do i = 1, 100
# ... = A(i-1) + ... ! non-local read
# enddo
symbolic MYPID, Q
Loop := { [i] : 1 <= i <= 100 }
RefSubscript := { [i] -> [i-1] }
MyArraySection := { [i] : 25 * MYPID <= i <= 25 * MYPID + 25 }
LocalIterSetForQ := { [i] : iter i is executed by processor `Q'}
ReadSectionForQ := RefSubscript(LocalIterSetForQ) # composition
SendToQ := ReadSectionForQ intersection MyLocalArraySection
codegen SendToQ
NonLocalIterSet := { iterations that access potentially non-local data }
LocalIterSet := { iterations that access only local data }
NonLocalReadIterSet := { iterations that read non-local data }
NonLocalWriteIterSet := { iterations that write non-local data }
LocalIterSet := { iterations that access only local data }
-- SEND for non-local READ --
COMPUTE NonLocalWriteIterSet
-- SEND for non-local WRITE --
COMPUTE LocalIterSet
-- RECV for non-local READ --
COMPUTE NonLocalReadIterSet
-- RECV for non-local WRITE --