Yuri Dotsenko, Cristian Coarfa, John Mellor-Crummey, Daniel Chavarria-Miranda. Experiences with Co-Array Fortran on Hardware Shared Memory Platforms. In Proceedings of the 17th International Workshop on Languages and Compilers for Parallel Computing (LCPC 2004), West Lafayette, Indiana, September 2004. [pdf]

Source-to-source translation is an important code generation strategy commonly used by parallelizing compilers and compilers for parallel languages. In this paper, we investigate the performance impact of using different Fortran 90 representations for local and remote accesses on scalable shared-memory multiprocessors when generating SPMD code for Co-array Fortran. Our aim is to deliver the full power of the hardware platform to the application when operating on local data and when accessing remote data using either coarse-grain or fine-grain communication. We explored the performance impact of several different representations for shared data and several different ways of implementing communication. Using CAF variants of the STREAM, Random Access, and NAS MG & SP benchmarks, we compared the performance of library-based implementations of one-sided communication with fine-grain communication that uses pointers to access remote data using load and store operations. Our experiments showed that using pointer-based fine-grain communication improved performance as much as a factor of 24 on an SGI Altix and as much as a factor of five on an SGI Origin, when pointer initialization is hoisted out of a loop performing data accesses.