Multipartitioning is a strategy for partitioning multi-dimensional arrays among a collection of processors so that line-sweep computations can be performed efficiently. The principal property of a multipartitioned array is that for a line sweep along any array dimension, all processors have the same number of tiles to compute at each step in the sweep, in other words, it describes a

latin hyper-rectangle, natural extension of the notion oflatin squares. This property results in full, balanced parallelism. A secondary benefit of multipartitionings is that they induce only coarse-grain communication.All of the multipartitionings described in the literature to date assign only one tile per processor per hyperplane of a multipartitioning (

latin hyper-cube). While this class of multipartitionings is optimal for two dimensions, in three dimensions it requires the number of processors to be a perfect square. This paper considers the general problem of computing optimal multipartitionings for multi-dimensional data volumes on an arbitrary number of processors. We describe an algorithm to compute ad-dimensional multipartitioning of a multi-dimensional array for an arbitrary number of processors. When using a multipartitioning to parallelize a line sweep computation, the best partitioning is the one that exploits all of the processors and has the smallest communication volume. To compute the best multipartitioning of a multi-dimensional array, we describe a cost model for selectingd, the dimensionality of the best partitioning, and the number of cuts along each partitioned dimension. In practice, our technique will choose a 3-dimensional multipartitioning for a 3-dimensional line-sweep computation, except whenpis a prime; previously, a 3-dimensional multipartitioning could be applied only when sqrt(p) is integral.Finally, we describe a prototype implementation of generalized multipartitioning in the Rice dHPF compiler and performance results obtained when using it to parallelize a line sweep computation for different numbers of processors.

Keywords:Generalized latin squares, partitions of integers, loop parallelization, array mapping, High Performance Fortran.