John Mellor-Crummey and John Garvin. Optimizing Sparse Matrix Vector Multiply using Unroll-and-jam Proceedings of the Los Alamos Computer Science Institute Third Annual Symposium October, 2002, Santa Fe, NM. Published on CD-ROM. [pdf]

Large-scale scientific applications frequently compute sparse matrix vector products in their computational core. For this reason, techniques for computing sparse matrix vector products efficiently on modern architectures are important. This paper describes a strategy for improving the performance of sparse matrix vector product computations using a loop transformation known as unroll-and-jam. We describe a novel sparse matrix representation that enables us to apply this transformation. Our approach is best suited for sparse matrices that have rows with a small number of predictable lengths. This work was motivated by sparse matrices that arise in SAGE, an ASCI application from Los Alamos National Laboratory. We evaluate the performance benefits of our approach using sparse matrices produced by SAGE for a pair of sample inputs. We show that our strategy is effective for improving sparse matrix vector product performance using these matrices on MIPS R12000, Alpha Ev67, IBM Power 3 and Itanium processors. Our measurements show that for this class of sparse matrices, our strategy improves sparse matrix vector product performance from a low of 11% on MIPS to well over a factor of two on Itanium.