BLIS Retreat 2019
Contributed talks
- 
     Cris Cecka, NVIDIA 
 Title: Programming GPUs for speed-of-light linear algebra
- 
     Marat Dukhan, Google Research 
 Title: Indirect GEMM and Indirect Convolution Algorithm
 
 Abstract: Deep learning frameworks commonly implement convolution operators with GEMM-based algorithms. In these algorithms, convolution is implemented on top of GEMM primitive, provided by highly optimized BLAS libraries. Convolutions with 1x1 kernels can be directly represented as a GEMM call, but convolutions with larger kernels require a special memory layout transformation - im2col or im2row - to fit into GEMM interface. The Indirect Convolution algorithm provides the efficiency of the GEMM primitive without the overhead of im2col transformation. In contrast to GEMM-based algorithms, the Indirect Convolution does not reshuffle the data to fit into the GEMM primitive but introduces an indirection buffer --a buffer of pointers to the start of each row of image pixels. This broadens the application of our modified GEMM function to convolutions with arbitrary kernel size, padding, stride, and dilation.
 Paper on ArXiv .
- 
     Albert Cohen, Google 
 Title: What MLIR has to offer as a toolkit for building and leveraging numerical libraries in ML and HPC
- 
     Thomas Hines, Tennessee Tech 
 Title: Issues with fat by thin matrix multiplication
- 
     Jianyu Huang, Facebook 
 Title: FBGEMM: High-Performance Low-Precision Library for Deep Learning Inference
- 
      Tze Meng Low, CMU 
 Analytical models for MMM-like problems on GPUs
- 
     Devin Matthews, SMU 
 Title: GEMM-Based Kernels for Tensor Hypercontraction
- 
     John McCalpin, TACC 
 Title: What You Don’t Know Can Hurt Performance — Snoop Filters in Intel Xeon Scalable Processors
- 
     Christos Psarras, RWTH-Aachen 
 Title: The Linear Algebra Mapping Problem
- 
     Martin Schatz, Facebook 
 Title: FLAME in Machine Learning (ML) Applications
- 
     Tyler Smith, ETH-Zurich 
 Title: I/O Lower Bounds for Small MMM
- 
     Nicholai Tukanov, UT-Austin 
 Title: Mapping BLIS to the IBM Power9 architecture
- 
      Field Van Zee, UT-Austin 
 Title: The BLIS Approach to Skinny Matrix Multiplication
- 
    Kiran Varaganti, AMD 
 Title: BLIS optimizations and results on AMD Rome (tentative title)
 
