Unit 4.2.3 Nicholai Tukanov, IBM/CMU Power10: MMA and GEMM in BLIS
"In conclusion" slide (click to enlarge)
Details relevant to the video and discussion:
POWER10 Simulator:
https://www14.software.ibm.com/webapp/set2/sas/f/pwrfs/pwr10/home.html
OpenBLAS:
https://github.com/xianyi/OpenBLAS
GCC Built-Ins:
https://gcc.gnu.org/onlinedocs/gcc/PowerPC-Matrix-Multiply-Assist-Built-in-Functions.html
-
POWER10 Hardware Specs:
602mm^2 die, 18B transistors, 7nm Samsung Process.
18-layer metal stack.
Single or dual chip sockets
Up to 15 SMT8 Cores at 4+ GHz -- 16 physically on die, but 15 enabled for better yield EA-tagged, 1.5x bigger L1 cache
4x L2, 120MB L3, low latency NUCA
2x general, 4x matrix SIMD vvs POWER9
1 TB/s PowerZXON + 1TB/s OMI
Up to 4TB RAM/socket, 410GB/s DDR4. DDR5 later.
PCIe Gen 5.