Subsection 12.3.1 Blocked Cholesky factorization
ΒΆIn the following video, we demonstrate how high-performance algorithms can be quickly translated to code using the FLAME abstractions. It is a long video that was recorded in a single sitting and has not been edited. (You need not watch the whole video if you "get the point.") The purpose is to convey the importance of programming in a way that reflects how one naturally derives and explains an algorithm. In the next unit, you will get to try such implementation yourself, for the LU factorization.
The notes to which this video refers can be found at
http://www.cs.utexas.edu/users/flame/Notes/NotesOnChol.pdf
.You can find all the implementations that are created during the video in the directory
Assignments/Week12/Chol/
. They have been updated slightly since the video was created in 2011. In particular, the Makefile was changed so that now the BLIS implementation of the BLAS is used rather than OpenBLAS.The Spark tool that is used to generate code skeletons can be found at
http://www.cs.utexas.edu/users/flame/Spark/
.-
The following reference may be useful:
A Quick Reference Guide to the FLAME API to BLAS functionality can be found at
http://www.cs.utexas.edu/users/flame/pubs/FLAMEC-BLAS-Quickguide.pdf
.[46] Field Van Zee, libflame: The Complete Reference,
http://www.lulu.com
, 2009.