Abstract
We present optimization techniques for efficient data parallel formulation/implementation of highly irregular problems, and apply the techniques to O(N) hierarchical N-body methods for large-scale N-body simulations. The performance results demonstrate that highly irregular scientific and engineering problems such as nonadaptive and adaptive O(N) hierarchical N-body methods can be efficiently implemented in high-level data parallel languages such as High Performance Fortran (HPF) on scalable parallel architectures. Previously, parallel scalable implementations of these hierarchical methods have been successful only with explicit message--passing programming.
First, we present optimization techniques for efficient data parallel implementation of irregular problems, focusing on minimizing the data movement through careful management of the data distribution and the data references, both between the memories of different nodes, and within the memory hierarchy of each node. For nonadaptive hierarchical N-body methods, our optimizations on improving arithmetic efficiencies include recognizing dominating computations as matrix-vector multiplications and aggregating them into multiple-instance matrix-matrix multiplications. Experimental results with an implementation in CM Fortran of Anderson's hierarchical N-body method demonstrate that performance competitive to that of the best message-passing implementations of the same class of methods can be achieved.
We also present a general data parallel formulation for highly irregular applications, and apply the formulation to an adaptive hierarchical N-body method with highly nonuniform particle distributions. The formulation consists of (1) a method for linearizing irregular data structures, (2) a data parallel implementation (in HPF) of graph partitioning algorithms applied to the linearized data structure, and (3) techniques for expressing irregular communications and nonuniform computations associated with the elements of linearized data structures. Experimental results demonstrate that efficient data parallel (HPF) implementations of highly irregular problems are feasible with the proper language/compiler/runtime support.