Ram Rajamony: Prescriptive performance tuning

Prescriptive Performance Tuning: The Rx Approach


Programmers often rely on performance analysis tools to improve the performance of parallel programs. Typically, these tools provide feedback about the program execution, such as the time spent in different routines, and the factors that slow down progress, such as cache misses. However, the nature of this feedback is far from satisfactory. Feedback, such as the time spent in different routines, is purely descriptive. Furthermore, the gap between machine--level feedback such as cache misses, and the source program, is considerable. Consequently, the cause--and--effect relationship between the source code and the tool feedback is difficult to ascertain. This makes it hard for the user to infer the true cause for poor performance.

[My approach rules]

As part of my dissertation work, I have developed a new approach for building performance tools that can overcome these problems in several domains. The key idea is that by satisfying a set of basic requirements, a performance tool can prescribe source-level changes to improve performance. These requirements are:

This approach can be used to design prescriptive tools for a wide range of problem domains, such as reducing inter--process interactions in concurrent programs, improving the cache behavior of sequential programs by changing the data layout, and indicating the best communication primitives to use in message--passing programs. The advantage of basing a tool on my approach is that it can be used by novice programmers to correct performance problems, while requiring only source--level, as opposed to architecture--related, reasoning about the program.

Rx is one such tool that I have developed to improve the performance of explicitly parallel shared memory programs. Rx automatically analyzes run--time data from program executions to prescribe transformations at the source--code level. These transformations target synchronization and some forms of data communication, two significant sources of overhead in shared--memory applications. A correctness framework ensures that transformations obtained from one or a small set of executions will be applicable to all executions. In a few cases, feedback from Rx has made a crucial difference, enabling applications that were originally slowing down on multiple processors to achieve a speedup.

Relevant Publications:


<- Back to the research summary page
Ramakrishnan Rajamony
E-mail: (MyLastName) at us.ibm.com [please do not e-mail me at cs.rice.edu]
Last updated at 17:59 CST on Wednesday, April 01, 1998