Unit 4.5.6 High-Performance Computing Beyond Matrix-Matrix Multiplication
ΒΆIn this course, we took one example, and used that example to illustrate various issues encountered when trying to achieve high performance. The course is an inch wide and a mile deep!
For materials that treat the subject of high performance computing more broadly, you may find [8] interesting:
Victor Eijkhout, Introduction to High-Performance Scientific Computing.