Apan Qasem, Guohua Jin, and John Mellor-Crummey. Improving Performance with Integrated Program Transformations Technical Report TR03-419, Dept. of Computer Science, Rice University, October, 2003. [pdf]

Achieving a high fraction of peak performance on today's computer systems is difficult for complex scientific applications. To do so, an application's characteristics must be tailored to exploit the characteristics of its target architecture. Today, commercial compilers do not adequately tailor programs automatically; thus, application scientists must settle for lackluster performance or manually transform their codes into a form that is complex and unmaintainable. In this paper, we describe a prototype source-to-source transformation tool that enables application scientists to achieve high performance for scientific codes without changing their natural coding style. Our tool supports a rich, integrated collection of optimizing transformations and provides users with precise control over how these optimizations should be applied. In preliminary experiments with the Runga-Kutta advection core from the NCOMMAS code for mesoscale weather modeling and Livermore Loop 18, we have used our tool to double single-processor performance.