Compiling TensorFlow Graphs to Parla

 
Project Contacts: Ian Henriksen

Project Description:
Existing deep learning frameworks focus on the idea of a DAG of kernel calls on different devices. They allow users to build DAGs of kernel calls and then provide a heterogeneous execution backend for those static DAGs. Ideal scheduling of a series of kernel calls is an NP-complete problem, however some good heuristics exist (see https://arxiv.org/abs/1711.01912).

We’ve been developing a prototype tasking system (parla) in Python that allows for dynamic task creation and explicit placement of tasks on the various devices available on a machine. This tasking system currently operates at a lower level than the task graphs built by tensorflow in that all data movement between devices is done explicitly by the programmer. It also allows dynamic creation of arbitrary tasks. Each task is expected to run on a specific user-specified device. It’s available at https://github.com/ut-parla/Parla.py.

The goal of this project is to provide a parla based backend for TensorFlow or some subset of TensorFlow. The backend does not have to handle every operation that tensorflow supports, but there should be some specific and realistic TensorFlow computation graphs that it can handle. Exploring better scheduling heuristics and verifying how to efficiently use parla are important aspects of this project.

Suggested timeline:

Papers: