Skip to main content
LAFF-On Programming for High Performance:
ulaff.net
Robert van de Geijn, Margaret Myers, Devangi Parikh
Contents
Index
Prev
Up
Next
Contents
Prev
Up
Next
Front Matter
Colophon
Dedication
Acknowledgements
Preface
0
Getting Started
Opening Remarks
Navigating the Course
Setting Up
Experimental Setup
Enrichments
Wrap Up
1
Loops and More Loops
Opening Remarks
Loop Orderings
Layering Matrix-Matrix Multiplication
Layering Matrix-Matrix Multiplication: Alternatives
Enrichments
Wrap Up
2
Start Your Engines
Opening Remarks
Blocked Matrix-Matrix Multiplication
Blocking for Registers
Optimizing the Micro-kernel
Enrichments
Wrap Up
3
Pushing the Limits
Opening Remarks
Leveraging the Caches
Packing
Further Tricks of the Trade
Enrichments
Wrap Up
4
Multithreaded Parallelism
Opening Remarks
OpenMP
Multithreading Matrix Multiplication
Parallelizing More
Enrichments
Wrap Up
Back Matter
A
B
GNU Free Documentation License
References
Index
Colophon
Authored in PreTeXt
permalink
Section
2.4
Optimizing the Micro-kernel
ΒΆ
2.4.1
Vector registers and instructions
2.4.2
Implementing the micro-kernel with vector instructions
2.4.3
Details
2.4.4
More options
2.4.5
Optimally amortizing data movement
login