Ram Rajamony: Prescriptive performance tuning
Prescriptive Performance Tuning: The Rx Approach
Programmers often rely on performance analysis tools to improve the
performance of parallel programs. Typically, these tools provide feedback
about the program execution, such as the time spent in different routines,
and the factors that slow down progress, such as cache misses. However, the
nature of this feedback is far from satisfactory. Feedback, such as the time
spent in different routines, is purely descriptive. Furthermore,
the gap between machine--level feedback such as cache misses, and the source
program, is considerable. Consequently, the cause--and--effect relationship
between the source code and the tool feedback is difficult to ascertain.
This makes it hard for the user to infer the true cause for poor
performance.
As part of my dissertation work, I have developed a new approach for
building performance tools that can overcome these problems in several
domains. The key idea is that by satisfying a set of basic requirements, a
performance tool can prescribe source-level changes to improve
performance. These requirements are:
- Automatically analyze run-time data to derive feedback
- Correlate the feedback with the source program
- Provide a framework to establish correctness of the feedback
This approach can be used to design prescriptive tools for a wide range
of problem domains, such as reducing inter--process interactions in
concurrent programs, improving the cache behavior of sequential programs by
changing the data layout, and indicating the best communication primitives
to use in message--passing programs. The advantage of basing a tool on my
approach is that it can be used by novice programmers to correct performance
problems, while requiring only source--level, as opposed to
architecture--related, reasoning about the program.
Rx is one such tool that I have developed to improve the
performance of explicitly parallel shared memory programs. Rx
automatically analyzes run--time data from program executions to prescribe
transformations at the source--code level. These transformations target
synchronization and some forms of data communication, two significant
sources of overhead in shared--memory applications. A correctness framework
ensures that transformations obtained from one or a small set of executions
will be applicable to all executions. In a few cases, feedback from
Rx has made a crucial difference, enabling applications that were
originally slowing down on multiple processors to achieve a speedup.
Relevant Publications:
-
A Performance Debugger for Eliminating Excess Synchronization in
Shared-memory Parallel Programs,
with Alan Cox
Proceedings of the Fourth International Workshop on Modeling,
Analysis, and Simulation of Computer and Telecommunication Systems
(MASCOTS), February 1996.
-
Performance Debugging Shared Memory Parallel Programs Using Run-Time
Dependence Analysis,
with Alan Cox,
Proceedings of the 1997 ACM SIGMETRICS International
Conference on Measurement and Modeling of Computer Systems,
June 1997, Seattle, WA.
Winner of the Best Student Paper Award
-
Prescriptive Performance Tuning: The Rx Approach,
Ph. D. Thesis, Rice University, January 1998.
Back to the research summary page
Ramakrishnan Rajamony
E-mail: (MyLastName) at us.ibm.com [please do not e-mail me at cs.rice.edu]
Last updated at 17:59 CST on Wednesday, April 01, 1998