CS 394P: Automatic Programming: Syllabus
Instructor: Gordon Novak
Office: TAY 4.140C. Office Hours: MWF 10-11 and after class.
Class Time: MWF 2:00 - 3:00 in BUR 134      
Unique No. 55860.
Prerequisites:
Graduate Standing; CS 375 (Compilers) and CS 381K (Artificial Intelligence)
or equivalent are recommended.
Class Notes: purchase at WEL 2.228.
Books: Lowry and McCartney, eds.,
Automating Software Design, at PCL reserve desk.
Jones, Gomard, and Sestoft,
Partial Evaluation and Automatic Program Generation,
online as PDF or
PostScript.
Czarnecki, K. and Eisenecker, U.,
Generative Programming: Methods, Tools and Applications,
Pearson, 2000.
Class Web Page:
http://www.cs.utexas.edu/users/novak/cs394p.html
Readings:
http://www.cs.utexas.edu/users/novak/cs394ppapers.html
CS Directory: /projects/cs394p/
FTP Directory:
ftp://ftp.cs.utexas.edu/pub/novak/cs394p/
Classwork and Grading:
Grades will be based on programming assignments, in-class presentations,
and class participation. The first part of the course will consist of
lectures, with programming assignments that involve pattern matching,
optimization, partial evaluation, object-oriented programming,
and the GLISP system and graphical programming based on views.
The assignments will generally be small, with the intent of gaining some
experience with each kind of system. Papers to be read will be assigned
corresponding to lecture topics. The latter part of the course will
focus on papers in the literature; each student will be expected to make
presentations to the class of some papers, which will be discussed.
Guest lecturers who are active researchers in the area will present
lectures on their research.
Course Outline:
- Introduction to Automatic Programming
- Program transformation by pattern matching
- Program analysis, optimization, and transformation
- Data flow and control flow analysis
- Memoization
- Finite differencing
- Partial evaluation
- Object-oriented and Aspect-oriented programming
- Glisp, views, graphical programming: VIEWAS, MKV, VIP, APS, GPS
- Transformational and deductive synthesis (Refine, KIDS; Manna & Waldinger)
- Very High Level Languages (SETL, SML)
- Scientific Program Generation (Kant)
Course Rationale:
This course is intended to give the student familiarity with the basic
research approaches to automatic programming and some experience with
useful techniques for program generation. This is a way to get
"higher on the food chain": instead of working hard to write programs,
write a program to write programs. Knowing the right techniques can
make the job vastly easier.
The way that people write programs today is not very different from
the way it was 40 years ago; the languages are not much
different either. Automatic Programming seeks to make programming
much easier, faster, cheaper, more reliable, and higher-level than
ordinary programming.
Several problems stand in the way of achieving these goals. One is
efficiency: it is easy to generate very inefficient programs from
high-level specifications; for example, it is easy to generate a
sorting program that takes exponential time. Much of the work of
human programmers is directed at efficiency; automatically producing
efficient code can be a significant problem. A second problem is
generality: while ordinary languages are low-level, they do allow
programs for any problem to be expressed; a high-level specification
language may work only for a narrow class of problems. A third
problem is usability by humans: a specification language that is as
hard to use as an ordinary language does not help.
Each of the technical topics to be covered in lecture and its relevance
to automatic programming is briefly discussed below.
Program Transformation by Patterns:
Much of the knowledge of programming is in the form of patterns:
data structure patterns, algorithm patterns, optimization patterns,
design patterns. An effective automatic programming system will
contain and use a great deal of knowledge, much of it in the form
of patterns.
There are two ways that patterns are often used:
- Substitute values into a pattern, e.g. make a linked list data
structure with a given contents type.
- Transform code using input and output patterns, e.g.
(* ?x 1) &rarr ?x, where ?x is a pattern
variable that will match anything.
Program transformation by patterns has two steps: matching (does
part of a program match an input pattern?) followed by
substitution (substituting parts of the existing program into
an output pattern). Pattern transformation can be very powerful.
We will do exercises that use patterns for optimization, specialization
of generic programs, and language translation.
Two issues complicate transformation using patterns. One is termination
(whether repeated application of patterns will terminate or loop);
we will review the
Knuth-Bendix algorithm
(
Wikipedia). A second problem is
whether applying one transformation will keep the program from matching
another one. Most patterns require localized information (to match
against the input part of the pattern); however, the transformation
specified by a pattern may delocalize information. It may be
that some sequence of transformations will produce exactly the program
we want; but how can we find that sequence?
Compiler Optimization:
Compiler courses such as CS 375 usually are not able to devote much time
to optimization. Compiler optimization is important for several reasons.
In order to perform optimization correctly, the compiler must first
perform program analysis. This section introduces data flow and
control flow analysis, program transformations, and requirements that must
be met in order for optimizing transformations to be correct. It is
useful to know what kinds of optimizations a compiler can do; automatic
programming researchers do not need to worry about inefficiencies that
a compiler can remove.
Memoization:
Memoization is like caching: saving previously computed values of an
expensive function. If the function is called again with the same argument
value, the previously computed value can be returned. Much of the code
written by humans is essentially memoization, since every data structure
saves something that could in principle be re-computed.
Finite Differencing:
Finite Differencing replaces repetitive computation of an expensive function
with computation of the change in the function from its previous
value. This is ancient mathematics and CS, but many kinds of computation can be
viewed as finite differencing. It is an important optimization technique.
Partial Evaluation:
Partial Evaluation is the technique of evaluating as much of a program
as possible at compile time, especially if some inputs of the program
are constant; the result is a specialized version of the program that runs
faster. Partial evaluation can have great power, e.g. it can produce a
compiler automatically from an interpreter! For automatic programming, partial
evaluation allows high-level generic programs that interpret declarative
specifications to be reused and specialized for an application.
Object-Oriented Programming:
Ideally, OOP allows reuse of methods defined for general classes of
objects; when it works, one gets those programs free. We will look
at how OOP works and its advantages and problems. Exercises using
a simple OOP system implemented in Lisp will stress writing methods
that are as general as possible (therefore reusable) and performance of OOP.
Glisp and Views: VIEWAS, MKV, VIP, APS:
Glisp (Generic Lisp) is a Lisp-based language with abstract data types.
It allows generic procedures to be specialized for an application through
views that describe how a concrete (application) type implements
an abstract type known to the system.
We will examine how this extends OOP and how partial evaluation
produces efficient output code. Exercises will illustrate generation
of code for applications using graphical user interfaces. VIEWAS and
MKV are interfaces that help the user to create views. VIP creates
programs from diagrams composed of "boxes" that represent physical
and mathematical principles, and connections between boxes.
APS (Automatic Programming Server) is a web-based server that can
synthesize programs in several languages and serve them to the user.
GPS (Graphical Programming System) allows programs to be specified by
connecting component boxes graphically.
Transformational and Deductive Synthesis:
A major school of research is generation of programs from specifications
stated in First-Order Predicate Calculus (FOPC). FOPC has several advantages:
it is formal, the program generation process can be proved correct,
and the specifications may be relatively compact for some kinds of
problems. It has some disadvantages too: writing specifications in FOPC
is more difficult than programming in typical programming languages;
not all problems are easily specified in FOPC; inefficient programs may
be generated.
We will examine in some detail the work on Refine and KIDS at Kestrel
Institute, some of the best work in this genre. We will also examine
the approach of Manna and Waldinger.
Scientific Program Generation:
SciNapse is a system that can generate scientific programs that compute
numerical solutions to systems of partial differential equations.
It uses specifications that are much shorter than the code it generates
and that experts in the areas of application can easily understand.
We will invite Dr. Elaine Kant to discuss her work on this system.
Very High Level Languages:
A common (and valid) criticism of ordinary programming languages is that
they are low-level. Some have proposed that Very High Level Languages
such as SETL (which allows sets as a data type and set operations as
part of the language) and SML (which allows type-checked generic
functions and functors [collections of type mappings and functions])
will solve the programming problem.
Programmer's Apprentice:
The Programmer's Apprentice of Rich and Waters was an attempt to provide
an interactive system to help a programmer write programs. While not
very successful, it provides some good lessons.
Intentional Programming:
Intentional Programming (IP) was a project at Microsoft Research; it has
since been spun off into a company called Intentional Software Corp.
In IP, the primary representation of a program is as an abstract syntax
tree (AST). AST's may be parsed from or unparsed to textual or graphical
languages. Enzymes perform transformations on AST's, producing new
AST's; compilation occurs by transformations that ultimately transform the
tree to assembly code.
Composition of Library Programs:
Given a logic representation of the results computed by subroutines in a
library, formal deduction can be used to compose subroutines to solve a
given problem, e.g. in astronomical calculations.
Human Usability:
No automatic programming system can be successful unless humans can
use it effectively. We will try to consider usability as we
discuss each approach to automatic programming.
Issues in Automatic Programming:
- Language for specifying programs:
- Special-purpose languages
- Very-high-level or wide-spectrum languages
- Graphical languages
- Can one language be good for all kinds of programming?
- Reuse of knowledge or code:
- What knowledge is reused?
- How is it represented?
- Reasoning:
- What form of reasoning about programs is done by the system?
- Is there a combinatoric explosion problem?
- What is the benefit of automatic programming?
- Increased speed (programming; execution)
- Lower cost (programming; execution)
- Correctness
- Maintainability, modifiability
- How hard is it to use the system?
- Can regular (non-Ph.D.) programmers use it?
- How much learning is required?
- What kinds of interaction with the system are required?
- Performance:
- Is performance good enough for the system to be usable?
- Can automatically generated programs beat human programs?
- Software engineering:
- Can programs be modified and maintained?
- Can someone else understand the code?
- How can old code be brought up to the new standard?
- How can requirements be related to code?