The above approach will not in general yield high performance,
since all local operations are performed using matrix-vector operations.
Better (near peak) performance can be attained by replacing
matrix-vector multiplication by
matrix-panel-of-vectors multiplication, and
rank-1 updates by
rank-k updates. The algorithms outlined above can be easily altered
to accommodate this, as will be explained in detail in Chapter .