Subsection 12.3.5 Sparse algorithms
¶Iterative methods inherently perform few floating point operations relative to the memory operations that need to be performed. For example, the Conjugate Gradient Method discussed in Section 8.3 typically spends most of its time in a sparse matrix-vector multiplication, where only two floating point operations are performed per nonzero element in the matrix. As a result, attaining high performance with such algorithms is inherently difficult.
A (free) text that gives a nice treatment of high performance computing, including sparse methods, is
[17] Victor Eijkhout, Introduction to High-Performance Scientific Computing, lulu.com.
http://pages.tacc.utexas.edu/~eijkhout/istc/istc.html