Authors
Rich Vuduc, James W Demmel, Katherine A Yelick, Shoaib Kamil, Rajesh Nishtala, Benjamin Lee
Publication date
2002/11/16
Conference
SC'02: Proceedings of the 2002 ACM/IEEE Conference on Supercomputing
Pages
26-26
Publisher
IEEE
Description
We consider performance tuning, by code and data structure reorganization, of sparse matrix-vector multiply (SpM×V), one of the most important computational kernels in scientific applications. This paper addresses the fundamental questions of what limits exist on such performance tuning, and how closely tuned code approaches these limits. Specifically, we develop upper and lower bounds on the performance (Mflop/s) of SpM×V when tuned using our previously proposed register blocking optimization. These bounds are based on the non-zero pattern in the matrix and the cost of basic memory operations, such as cache hits and misses. We evaluate our tuned implementations with respect to these bounds using hardware counter data on 4 different platforms and on test set of 44 sparse matrices. We find that we can often get within 20% of the upper bound, particularly on class of matrices from finite element …
Total citations
20022003200420052006200720082009201020112012201320142015201620172018201920202021202220232024212141415918101013157109435463979
Scholar articles
R Vuduc, JW Demmel, KA Yelick, S Kamil, R Nishtala… - SC'02: Proceedings of the 2002 ACM/IEEE Conference …, 2002