Learning tile sizes

Learning optimal tile sizes using neural networks

Project contacts: Roshan Dathathri

Project description: Affine loop nests are arbitrarily nested loop nests in which the array accesses and the loop bounds are affine functions (linear functions with a constant offset) of the program parameters. They form the compute-intensive core of scientific computations like linear-algebra kernels and stencil-style computations. Compilers (using the polyhedral model) can statically analyze these loop nests and generate parallel and tiled code without programmer intervention. However, the performance of the generated code is highly sensitive to the tile size, which is dependent on the machine architeture (but not the problem sizes). So, the programmer has to choose the right tile sizes to get the best performance. Exhaustively searching for the best tile sizes (auto-tuning) is time consuming.

Analytical models have been proposed to compute the best tile sizes. These models are based on analyzing both machine architecture features and program features meticuously to find an analytical function for the optimal tile sizes in terms of the values of the features. Yotov et al. in PLDI 2003 proposed an analytical model for determining optimal tile sizes for matrix multiplication. They showed that their analytical model can do almost as well as exhaustive search. However, their analytical model is specific to one problem - matrix multiplication. Extending it to other affine loop nests would require analyzing each of the problems independently.

Recent work has tried to use machine learning techniques to learn optimal tile sizes automatically. Cummins et al. in ADAPT 2016 use classifiers and regressors to learn tile sizes for stencil GPU kernels. The advantage is that the best tile sizes for different stencil problems can be learned automatically. Neural networks seem well suited to learn tile sizes because the analytical functions for optimal tile sizes are usually not linear functions.

The goal of this project is to build a machine learning system - possibly based on deep neural networks - that takes affine loop nest features and architecture features as inputs and predicts the best tile size for that loop nest. To begin with, you can restrict yourself to perfectly nested loops. Here are things to think about.

Loop nest features and architectural features relevant for optimal tile size determination: You can look at the Yotov et al analytical model for MMM to see what features they used for MMM. The Cummins et al. machine learning model might also provide insight. You can also investigate techniques from machine learning for finding relevant features automatically.
Training data: Once you have decided on features, you will need training data. If F1 and F2 are the features, the goal of training is to learn a function TS: F1xF2 -> TileSize. Therefore, your training data will consist of tuples of the form (f1,f2,t) where f1 and f2 are values for the features, and t is the optimal tile size for those features. To create these tuples, you will need to determine optimal tile sizes for a variety of loop nests and a variety of architectures. For matrix multiplication, you can use ATLAS to find optimal tile sizes. For stencil codes, you can use a polyhedral compiler like Pluto to generate code for a specific tile size.
Machine learning system: You are free to use any system you like. M5 is one possibility. Deep neural networks are hot nowadays so you can also consider these.
You can do this project in stages. For example, a first step might be to restrict yourself to MMM, train using MMM data, and see if the function your system learns resembles the analytical one from Yotov et al. Similarly, you can train another system for stencil codes and compare with Cummins et al. Of course, the ultimate goal is to build a single tile size predictor that can handle any affine loop nest.

Project deliverables and deadlines:

(Nov 1) A clear statement in English describing your project proposal.
(Nov 8) A survey of analytical models, polyhedral compilers, and neural networks.
(Dec 6) A tool that takes loop nest features and machine features as input and outputs the tile size to use in the code for that machine.
(Dec 6) A project report, written like an ACM conference, that summarizes the work you did.

Papers:

Is Search Really Necessary to Generate High-Performance BLAS? Kamen Yotov, Xiaoming Li, Gang Ren, Maria Garzaran, David Padua, Keshav Pingali, Paul Stodghill. PLDI 2003.
A Practical Automatic Polyhedral Parallelizer and Locality Optimizer. Uday Bondhugula, A. Hartono, J. Ramanujan, P. Sadayappan. ACM SIGPLAN Programming Languages Design and Implementation (PLDI), Jun 2008, Tucson, Arizona.
Autotuning OpenCL Workgroup Size for Stencil Patterns. Chris Cummins, Pavlos Petoumenos, Michel Steuwer, Hugh Leather. In Proceedings of the 6th International Workshop on Adaptive Self-tuning Computing Systems (ADAPT'16).
Introduction to Neural Networks. Yaser Abu-Mostafa. An online course lecture featured on edX.