HPCA '96

A Comparison of Entry Consistency and Lazy Release Consistency Implementations

by Sarita V. Adve, Alan L. Cox, Sandhya Dwarkadas, Ramakrishnan Rajamony and Willy Zwaenepoel in the proceedings of the 2nd International Symposium on High-Performance Computer Architecture, February 1996.

Abstract

This paper compares several implementations of entry consistency (EC) and lazy release consistency (LRC), two relaxed memory models in use with software distributed shared memory (DSM) systems. We use six applications in our study: SOR, Quicksort, Water, Barnes-Hut, IS, and 3D-FFT. For these applications, EC's requirement that all shared data be associated with a synchronization object leads to a fair amount of additional programming effort. We identify, in particular, extra synchronization, lock rebinding, and object granularity as sources of extra complexity. In terms of performance, for the set of applications and for the computing environment utilized neither model is consistently better than the other. For SOR and IS, execution times are about the same, but LRC is faster for Water (33%) and Barnes-Hut (41%) and EC is faster for Quicksort (14%) and 3D-FFT (10%).

Among the implementations of EC and LRC, we independently vary the method for write trapping and the method for write collection. Our goal is to separate implementation issues from any particular model. We consider write trapping by compiler instrumentation of the code and by twinning (comparing the current version of shared data with an older version). Write collection is done either by scanning timestamps or by building diffs, records of the changes to shared data. For write trapping in EC, twinning is faster if data is shared at the granularity of a single word. For larger granularities than a word, compiler instrumentation is faster. For write trapping in LRC, twinning gives the best performance for all applications. For write collection in EC, timestamping works best in applications dominated by migratory data, while for other data diffing works best. For LRC, increased communication overhead in transmitting timestamps becomes an additional factor working in favor of diffing for applications with fine-grain sharing.

Compressed postscript (69 KBytes)

Here are the (color) slides from the presentation I gave at the conference.
Compressed postscript (46 KBytes)


<- Back to publications page
Ramakrishnan Rajamony
E-mail: (MyLastName) at us.ibm.com [please do not e-mail me at cs.rice.edu]
Last updated at 23:22 CST on Wednesday, November 19, 1997