Project
          contacts: Roshan Dathathri
        
        
Project description:
        Affine loop nests are arbitrarily nested loop nests in which the
        array accesses and the loop bounds are affine functions (linear
        functions with a constant offset) of the program parameters.
        They form the compute-intensive core of scientific computations
        like linear-algebra kernels and stencil-style computations.
        Compilers (using the polyhedral model) can statically analyze
        these loop nests and generate parallel and tiled code without
        programmer intervention. However, the performance of the
        generated code is highly sensitive to the tile size, which is
        dependent on the machine architeture (but not the problem
        sizes). So, the programmer has to choose the right tile sizes to
        get the best performance. Exhaustively searching for the best
        tile sizes (auto-tuning) is time consuming. 
        
        Analytical models have been proposed to compute the best tile
        sizes. These models are based on analyzing both machine
        architecture features and program features meticuously to find
        an analytical function for the optimal tile sizes in terms of
        the values of the features. Yotov et al. in PLDI 2003 proposed
        an analytical model for determining optimal tile sizes for
        matrix multiplication. They showed that their analytical model
        can do almost as well as exhaustive search. However, their
        analytical model is specific to one problem - matrix
        multiplication. Extending it to other affine loop nests would
        require analyzing each of the problems independently. 
        
        Recent work has tried to use machine learning techniques to
        learn optimal tile sizes automatically. Cummins et al. in ADAPT
        2016 use classifiers and regressors to learn tile sizes for
        stencil GPU kernels. The advantage is that the best tile sizes
        for different stencil problems can be learned automatically.
        Neural networks seem well suited to learn tile sizes because the
        analytical functions for optimal tile sizes are usually not
        linear functions. 
        
        The goal of this project is to build a machine learning system -
        possibly based on deep neural networks - that takes affine loop
        nest features and architecture features as inputs and predicts
        the best tile size for that loop nest. To begin with, you can
        restrict yourself to perfectly nested loops. Here are things to
        think about.
        
          - Loop nest features and architectural features relevant
              for optimal tile size determination: You can look at
            the Yotov et al analytical model for MMM to see  what
            features they used for MMM. The Cummins et al. machine
            learning model might also provide insight. You can also
            investigate techniques from machine learning for finding
            relevant features automatically.
- Training data: Once you have decided on features,
            you will need training data. If F1 and F2 are the features,
            the goal of training is to learn a function TS: F1xF2 ->
            TileSize. Therefore, your training data will consist of
            tuples of the form (f1,f2,t) where f1 and f2 are values for
            the features, and t is the optimal tile size for those
            features. To create these tuples, you will need to determine
            optimal tile sizes for a variety of loop nests and a variety
            of architectures. For matrix multiplication, you can use
            ATLAS to find optimal tile sizes. For stencil codes, you can
            use a polyhedral compiler like Pluto to generate code for a
            specific tile size.
- Machine learning system: You are free to use any
            system you like. M5 is one possibility. Deep neural networks
            are hot nowadays so you can also consider these.
- You can do this project in stages. For example,  a
            first step  might be to restrict yourself to MMM, train
            using MMM data, and see if the function your system learns
            resembles the analytical one from Yotov et al. Similarly,
            you can train another system for stencil codes and compare
            with Cummins et al. Of course, the ultimate goal is to build
            a single tile size predictor that can handle any affine loop
            nest. 
 
Project deliverables and
          deadlines: 
        
          - (Nov 1) A clear statement in English describing your
            project proposal.
- (Nov 8) A survey of analytical models, polyhedral
            compilers, and neural networks. 
- (Dec 6) A tool that takes loop nest features and machine
            features as input and outputs the tile size to use in the
            code for that machine.
 
- (Dec 6) A project report, written like an ACM conference,
            that summarizes the work you did.
 
Papers: 
        
        
          - Is
              Search Really Necessary to Generate High-Performance BLAS?
            Kamen Yotov, Xiaoming Li, Gang Ren, Maria Garzaran, David
            Padua, Keshav Pingali, Paul Stodghill. PLDI 2003. 
- A
              Practical Automatic Polyhedral Parallelizer and Locality
              Optimizer. Uday Bondhugula, A. Hartono, J. Ramanujan,
            P. Sadayappan. ACM SIGPLAN Programming Languages Design and
            Implementation (PLDI), Jun 2008, Tucson, Arizona. 
- Autotuning
              OpenCL Workgroup Size for Stencil Patterns. Chris
            Cummins, Pavlos Petoumenos, Michel Steuwer, Hugh Leather. In
            Proceedings of the 6th International Workshop on Adaptive
            Self-tuning Computing Systems (ADAPT'16). 
- Introduction
              to Neural Networks. Yaser Abu-Mostafa. An online
            course lecture featured on edX.