CS 394P: Automatic Programming: Syllabus

Instructor: Gordon Novak

Office: TAY 4.140C. Office Hours: MWF 10-11 and after class.

Class Time: MWF 2:00 - 3:00 in BUR 134 Unique No. 55860.

Prerequisites: Graduate Standing; CS 375 (Compilers) and CS 381K (Artificial Intelligence) or equivalent are recommended.

Class Notes: purchase at WEL 2.228.

Books: Lowry and McCartney, eds., Automating Software Design, at PCL reserve desk.

Jones, Gomard, and Sestoft, Partial Evaluation and Automatic Program Generation, online as PDF or PostScript.

Czarnecki, K. and Eisenecker, U., Generative Programming: Methods, Tools and Applications, Pearson, 2000.

Class Web Page: http://www.cs.utexas.edu/users/novak/cs394p.html

Readings: http://www.cs.utexas.edu/users/novak/cs394ppapers.html

CS Directory: /projects/cs394p/

FTP Directory: ftp://ftp.cs.utexas.edu/pub/novak/cs394p/

Classwork and Grading:

Grades will be based on programming assignments, in-class presentations, and class participation. The first part of the course will consist of lectures, with programming assignments that involve pattern matching, optimization, partial evaluation, object-oriented programming, and the GLISP system and graphical programming based on views. The assignments will generally be small, with the intent of gaining some experience with each kind of system. Papers to be read will be assigned corresponding to lecture topics. The latter part of the course will focus on papers in the literature; each student will be expected to make presentations to the class of some papers, which will be discussed. Guest lecturers who are active researchers in the area will present lectures on their research.

Course Outline:

Introduction to Automatic Programming
Program transformation by pattern matching
Program analysis, optimization, and transformation
- Data flow and control flow analysis
- Memoization
- Finite differencing
Partial evaluation
Object-oriented and Aspect-oriented programming
Glisp, views, graphical programming: VIEWAS, MKV, VIP, APS, GPS
Transformational and deductive synthesis (Refine, KIDS; Manna & Waldinger)
Very High Level Languages (SETL, SML)
Scientific Program Generation (Kant)

Course Rationale:

This course is intended to give the student familiarity with the basic research approaches to automatic programming and some experience with useful techniques for program generation. This is a way to get "higher on the food chain": instead of working hard to write programs, write a program to write programs. Knowing the right techniques can make the job vastly easier.

The way that people write programs today is not very different from the way it was 40 years ago; the languages are not much different either. Automatic Programming seeks to make programming much easier, faster, cheaper, more reliable, and higher-level than ordinary programming.

Several problems stand in the way of achieving these goals. One is efficiency: it is easy to generate very inefficient programs from high-level specifications; for example, it is easy to generate a sorting program that takes exponential time. Much of the work of human programmers is directed at efficiency; automatically producing efficient code can be a significant problem. A second problem is generality: while ordinary languages are low-level, they do allow programs for any problem to be expressed; a high-level specification language may work only for a narrow class of problems. A third problem is usability by humans: a specification language that is as hard to use as an ordinary language does not help.

Each of the technical topics to be covered in lecture and its relevance to automatic programming is briefly discussed below.

Program Transformation by Patterns:

Much of the knowledge of programming is in the form of patterns: data structure patterns, algorithm patterns, optimization patterns, design patterns. An effective automatic programming system will contain and use a great deal of knowledge, much of it in the form of patterns.

There are two ways that patterns are often used:

Substitute values into a pattern, e.g. make a linked list data structure with a given contents type.
Transform code using input and output patterns, e.g. (* ?x 1) &rarr ?x, where ?x is a pattern variable that will match anything.

Program transformation by patterns has two steps: matching (does part of a program match an input pattern?) followed by substitution (substituting parts of the existing program into an output pattern). Pattern transformation can be very powerful. We will do exercises that use patterns for optimization, specialization of generic programs, and language translation.

Two issues complicate transformation using patterns. One is termination (whether repeated application of patterns will terminate or loop); we will review the Knuth-Bendix algorithm ( Wikipedia). A second problem is whether applying one transformation will keep the program from matching another one. Most patterns require localized information (to match against the input part of the pattern); however, the transformation specified by a pattern may delocalize information. It may be that some sequence of transformations will produce exactly the program we want; but how can we find that sequence?

Compiler Optimization:

Compiler courses such as CS 375 usually are not able to devote much time to optimization. Compiler optimization is important for several reasons. In order to perform optimization correctly, the compiler must first perform program analysis. This section introduces data flow and control flow analysis, program transformations, and requirements that must be met in order for optimizing transformations to be correct. It is useful to know what kinds of optimizations a compiler can do; automatic programming researchers do not need to worry about inefficiencies that a compiler can remove.

Memoization:

Memoization is like caching: saving previously computed values of an expensive function. If the function is called again with the same argument value, the previously computed value can be returned. Much of the code written by humans is essentially memoization, since every data structure saves something that could in principle be re-computed.

Finite Differencing:

Finite Differencing replaces repetitive computation of an expensive function with computation of the change in the function from its previous value. This is ancient mathematics and CS, but many kinds of computation can be viewed as finite differencing. It is an important optimization technique.

Partial Evaluation:

Partial Evaluation is the technique of evaluating as much of a program as possible at compile time, especially if some inputs of the program are constant; the result is a specialized version of the program that runs faster. Partial evaluation can have great power, e.g. it can produce a compiler automatically from an interpreter! For automatic programming, partial evaluation allows high-level generic programs that interpret declarative specifications to be reused and specialized for an application.

Object-Oriented Programming:

Ideally, OOP allows reuse of methods defined for general classes of objects; when it works, one gets those programs free. We will look at how OOP works and its advantages and problems. Exercises using a simple OOP system implemented in Lisp will stress writing methods that are as general as possible (therefore reusable) and performance of OOP.

Glisp and Views: VIEWAS, MKV, VIP, APS:

Glisp (Generic Lisp) is a Lisp-based language with abstract data types. It allows generic procedures to be specialized for an application through views that describe how a concrete (application) type implements an abstract type known to the system. We will examine how this extends OOP and how partial evaluation produces efficient output code. Exercises will illustrate generation of code for applications using graphical user interfaces. VIEWAS and MKV are interfaces that help the user to create views. VIP creates programs from diagrams composed of "boxes" that represent physical and mathematical principles, and connections between boxes. APS (Automatic Programming Server) is a web-based server that can synthesize programs in several languages and serve them to the user. GPS (Graphical Programming System) allows programs to be specified by connecting component boxes graphically.

Transformational and Deductive Synthesis:

A major school of research is generation of programs from specifications stated in First-Order Predicate Calculus (FOPC). FOPC has several advantages: it is formal, the program generation process can be proved correct, and the specifications may be relatively compact for some kinds of problems. It has some disadvantages too: writing specifications in FOPC is more difficult than programming in typical programming languages; not all problems are easily specified in FOPC; inefficient programs may be generated. We will examine in some detail the work on Refine and KIDS at Kestrel Institute, some of the best work in this genre. We will also examine the approach of Manna and Waldinger.

Scientific Program Generation:

SciNapse is a system that can generate scientific programs that compute numerical solutions to systems of partial differential equations. It uses specifications that are much shorter than the code it generates and that experts in the areas of application can easily understand. We will invite Dr. Elaine Kant to discuss her work on this system.

Very High Level Languages:

A common (and valid) criticism of ordinary programming languages is that they are low-level. Some have proposed that Very High Level Languages such as SETL (which allows sets as a data type and set operations as part of the language) and SML (which allows type-checked generic functions and functors [collections of type mappings and functions]) will solve the programming problem.

Programmer's Apprentice:

The Programmer's Apprentice of Rich and Waters was an attempt to provide an interactive system to help a programmer write programs. While not very successful, it provides some good lessons.

Intentional Programming:

Intentional Programming (IP) was a project at Microsoft Research; it has since been spun off into a company called Intentional Software Corp. In IP, the primary representation of a program is as an abstract syntax tree (AST). AST's may be parsed from or unparsed to textual or graphical languages. Enzymes perform transformations on AST's, producing new AST's; compilation occurs by transformations that ultimately transform the tree to assembly code.

Composition of Library Programs:

Given a logic representation of the results computed by subroutines in a library, formal deduction can be used to compose subroutines to solve a given problem, e.g. in astronomical calculations.

Human Usability:

No automatic programming system can be successful unless humans can use it effectively. We will try to consider usability as we discuss each approach to automatic programming.

Issues in Automatic Programming:

Language for specifying programs:
- Special-purpose languages
- Very-high-level or wide-spectrum languages
- Graphical languages
- Can one language be good for all kinds of programming?
Reuse of knowledge or code:
- What knowledge is reused?
- How is it represented?
Reasoning:
- What form of reasoning about programs is done by the system?
- Is there a combinatoric explosion problem?
What is the benefit of automatic programming?
- Increased speed (programming; execution)
- Lower cost (programming; execution)
- Correctness
- Maintainability, modifiability
How hard is it to use the system?
- Can regular (non-Ph.D.) programmers use it?
- How much learning is required?
- What kinds of interaction with the system are required?
Performance:
- Is performance good enough for the system to be usable?
- Can automatically generated programs beat human programs?
Software engineering:
- Can programs be modified and maintained?
- Can someone else understand the code?
- How can old code be brought up to the new standard?
- How can requirements be related to code?