Package scale.score

This package implements an internal representation of source programs using a control flow graph (CFG) called Scribble.

See:
          Description

Interface Summary
Predicate This interface defines a Scribble CFG traversal predicate.
 

Class Summary
CDG The CDG class builds the control dependence graph for the scribble graph input.
DominanceFrontier This class computes and manages dominance frontiers.
Domination This class computes the dominators and post dominators of nodes in a graph.
InductionVar Record information about a loop induction variable.
Note This class is the base class for the CFG class hierarchy.
PureFunctionAnalyser This class adds purity level information to RoutineDecls.
Scribble This class represents a Scribble graph (CFG).
Scribble2C A class to generate C code from a Scribble CFG.
SSA This class converts a Scribble CFG into the SSA form of the CFG.
 

Package scale.score Description

This package implements an internal representation of source programs using a control flow graph (CFG) called Scribble. The CFG is implemented using Chord instances as the nodes of the graph. Expressions are attached to the CFG nodes. Other representations are available but all optimizations are performed on the CFG while it is in SSA form. The Scribble class is the container class for the CFG.

The scale.score.chords package is used to represent nodes in the CFG. The scale.score.expr package is used to represent expression trees. Each ExprChord instance has a single expression tree attached. The easiest way to understand this is to display the CFG for VERY simple programs using the -sga and -sgb switches on the command line. For example,

  java scale.test.Scale test.c -sga s -sgb s -O s
will display the CFG for test.c both before and after entering SSA form. Make sure that the daVinci executable is accessible on your path. The documentation for the LoopHeaderChord class illustrates a CFG.

Control Flow Graph (CFG)

Scale uses an internal control flow graph to represent programs. All optimizations are performed on this representation. The graph may be in static single assignment (SSA) form. Currently, all major optimizations in Scale are performed when the CFG is in SSA form. While it is possible to go in and out of SSA form, our current optimizations leave the CFG in a valid SSA form making this un-necessary.

Nodes in the CFG are one of two flavors. Executable statements are represent by an instance of the Chord class. Expressions are represented by an instance of the Expr class.

Statements (Chords)

Instances of the Chord class form the nodes of the CFG that are linked together to form the graph. An instance of a SequentialChord has only one successor while a DecisionChord instance has two or more successors. Any node may have one or more predecessors. Classes derived from these two classes are used to represent various executable statements. A simplified class hierarchy is:
Chord <- SequentialChord <- ExprChord <- PhiExprNode
                         <- LoopPreHeaderChord
                         <- LoopHeaderChord <- BeginChord
                         <- LoopExitChord
                         <- LoopTailChord
                         <- BranchChord <- GotoChord
                         <- EndChord <- LeaveChord <- ReturnChord
      <- DecisionChord <- IfThenEsleChord
                       <- SwitchChord
Irreducible graphs are converted to reducible graphs by node-splitting. All implicit and explicit loops in the original program are converted to the same form. For example, the C for statement
  for (int i = 0; i < n; i++) body;
the graph looks like:
         ExprChord (i = 0)
              |
              v
         LoopPreHeader
              |
              v
    ---->LoopHeader
    ^         |
    |         v
    |    IfThenElseChord (i < n) -> LoopExitChord ->
    |         | T
    |         v
    |       body
    |         |
    |         v
    |    ExprNode (i++)
    |         |
    |         v
     <---LoopTailChord
The regularity of this form makes analysis of the graph easier to accomplish; there are only two predecessors to the LoopHeader, all loop exits are marked, and the LoopPreHeader and LoopHeader provide markers for the places that the SSA required Phi nodes may be inserted.

If the loop is placed in the loop test at end form, the graph looks like:

         ExprChord (i = 0)
              |
              v
         IfThenElseChord (i >= n) ------------------>
              |                                      |
              v                                      |
         LoopPreHeader                               |
              |                                      |
              v                                      |
    ---->LoopHeader                                  |
    ^         |                                      |
    |         v                                      |
    |       body                                     |
    |         |                                      |
    |         V                                      |
    |    ExprNode (i++)                              |
    |         |                                      |
    |         v                                      v
    |    IfThenElseChord (i < n) -> LoopExitChord ->
    |         | T
    |         v
     <---LoopTailChord
Basic blocks are not represented explicitly. Rather, a simple detection scheme is used to detect the beginning and and of a basic block: The disadvantage to this scheme is that there are many more nodes in the CFG than would be the case if only explicit basic blocks were nodes. As an example, when processing Phi nodes of a basic block, all CFG nodes in the basic block must be scanned. That is why Scale uses an explicit class (PhiExprChord) for these nodes even though they are identical to any other ExprChord. The advantage to this representation is that manipulations of the graph are straight forward.

It is possible to travel easily in any direction in the graph. However, most current processing in Scale spans the graph in the same way; the backward link is used primarily for inserting and deleting nodes.

Each node has an integer color. To scan every node in the graph, first the base color is incremented by one. Starting at the root node in the graph, all successors whose color is different from the base color are placed on a stack and their color is changed to the base color. As each node is popped off of the stack, its successors are processed similarly.

Expressions

Expressions are represented as a tree. Each node of the tree is unique -- it does not appear in any other tree. Chords may be linked to none, one, or more expressions depending on the Chord class. An ExprChord instance has one or two expression trees while a LoopHeaderChord instance has none.

The simplified expression class hierarchy is:

  Expr <- UnaryExpr <- AbsoluteExpr
                    <- BitComplementExpr
                    <- ConversionExpr
                    <- NotExpr
                    <- NegativeExpr
                    <- LoadValueIndirectExpr
                    <- FieldExpr <- LoadFieldAddressExpr
                                 <- LoadFieldValueExpr
       <- BinaryExpr <- AdditionExpr
                     <- SubstractionExpr
                     <- MultiplicationExpr
                     <- DivisionExpr
                     <- EqualityExpr
                     <- LessExpr
                     <- LessEqualExpr
                     <- MaxExpr
                     <- AndExpr
                     <- OrExpr
                     <- BitAndExpr
                     <- BitOrExpr
       <- NaryExpr <- CallExpr
                   <- VectorExpr <- PhiExpr
       <- LoadExpr <- LoadDeclValueExpr
                   <- LoadDeclAddressExpr
       <- DualExpr
       <- SubscriptExpr
       <- ValueExpr <- LiteralExpr
                    <- NilExpr
Each expression has a type and none, one, or more successor expressions. Other information may be associated with each expression.

When in SSA form, each LoadExpr instance has a link to the ExprChord instance that defines that version of the variable. Likewise, the ExprChord instance has a list of the LoadExpr instances that use the value defined. The SSA variables are actually different variables represented by a different instance of the VariableDecl class. Each such instance has a link to the original VariableDecl instance.

The DualExpr class is used to represent expressions that have both a high-level form and a low-level form. For example, subscript operations have a high-level form such as a[i] and a low-level form that uses address operations such as address_of(a) + i * sizeof(int). The high-level form is used in analysis such as dependence testing and the low level form is used for code generation. Each high-level expression provides a method that converts it to its low-level form.

The MayDef and MayUse are used to represent the information obtained from alias analysis.

The expression trees for c = a++ look like:

  ExprChord <- LoadDeclAddressExpr (temp)
     |      <- LoadDeclValueExpr (a)
     v
  ExprChord <- LoadDeclAddressExpr (a)
     |      <- AdditionExpr <- LoadDeclValueExpr (temp)
     |                      <- LiteralExpr (1)
     v
  ExprChord <- LoadDeclAddressExpr (c)
            <- LoadDeclValueExpr (temp)
In SSA form there would be a link from the first ExprChord instance to the two LoadDeclValue instances for \verb*temp*.

Optimizations

The current version of Scale performs only one loop optimization (loop permutation). Several scalar optimizations are performed: All but the last two require the CFG to be in SSA form and alias analysis to have been performed.

Several optimizations require dominance information. Scale provides methods for computing both the pre- and post-dominance information, the dominance frontier, and the iterated dominance frontier. If the optimization adds or deletes CFG nodes, this information will be re-computed on demand for the next optimization.

Scale provides reference information for variables that lists each CFG node and expression node that references a particular variable. This information is updated as an optimization adds or deletes references.

While not currently used, Scale provides a method to obtain the control dependence graph. This information was needed for research into branch prediction.

Each optimization can be selection by a command line switch. The order that the optimizations is not constrained.

The coalescing of SSA variable names when leaving SSA form is not strictly required. It is also the most expensive process because it does a live analysis of each SSA variable. We leave it in because it makes debugging the generated C or assembly code easier; the variable reference correspond more closely to the original program.

Our experiments have indicated that partial-redundancy elimination has very little effect on the 51 benchmarks in our regression tests. And, copy propagation is not required unless partial-redundancy elimination is performed.

Alias Analyses

Scale provides seven ways of performing alias analysis:
  1. None - no SSA-based optimizations can be performed.
  2. Simple - All variables whose address is taken are placed in the same alias category.
  3. Steensgaard based on [].
  4. Shapiro-Horowitz based on []. 1, 2, 4, or 8 categories are allowed.
  5. Inter-procedural Simple
  6. Inter-procedural Steensgaard.
  7. Inter-procedural Shapiro-Horowitz.
For the inter-procedural analysis all modules of a program must be compiled together. Every routine in the program is converted to its CFG form before analysis is performed. This results in very large compile-time memory requirements. Because of the memory requirements and because we currently have no optimizations that use the inter-procedural information, most compiles are performed using either method 1.

Experiments we have run to-date do not show any significant improvement of Steensgaard or Shapiro-Horowitz over the simple analysis on our set of 51 benchmarks.

Profiling

The user's program is first modified to produce the profiling information by inserting operations into the CFG. This insertion occurs just prior to code generation. The user requests this via a command line switch.

When the program is executed it produces a .pft file for every source module in the program. This file contains the profile information on the number of times a block, edge, or path in the program was executed.

On a subsequent compilation an optimization can request that this profile information be read and the CFG annotated with it. For edge profiling, the out-going CFG edges of a scale.score.chords.DecisionChord are given a probability of that edge being used.

In order for this to work, the CFG must be equivalent, in the structure of the graph edges when the profile information is read, to the CCFG when the profiling operations were inserted.

FAQ

I am a bit lost in understanding the structure of the Scale CFG. So, I have a few questions. Instances of the Chord class appear to represent a number of things. Is this correct? Or, are all Chord instances basic blocks?

Chord instances are nodes in the CFG. Scale does not explicitly represent basic blocks. Most Chord instances represent simple imperative operations and are of the form
   a <- b + c

There are methods that let you determine if a CFG node begins or ends a basic block. The simple definition of a basic block in Scale is

Is there code in Scale that would help me detect basic blocks?

Yes -
firstInBasicBlock()
returns the first CFG node (Chord) in the basic block containing chord
lastInBasicBlock()
returns true if this CFG node is the last node in a basic block.
SCC has examples of their use.

What does a loop look like?

The Loop class represents information about a loop. It is not the loop. The loop is in the CFG and has this basic form:
             |
             v
           LoopPreHeaderChord
             |
             v
   ------> LoopHeaderChord
   |         |
   |         v
   |       IfThenElseChord -------> LoopExitChord
   |         |
   |         v
   |        ....
   |         |
   |         v
   |<----  LoopTailChord
The loop exit test position may vary depending on whether it is in SSA form, what the source program form was, was l specified using the -O switch selected, etc. And, there may be more than one LoopExitChord.

Is the loop test always done at the top of the loop?

No - using l with the -O switch causes the loop test to be placed at the end of a loop and do-while loops naturally have the test at the end.

Is there a document with a graphical depiction of a CFG? I would like to be able to see what a loop looks like.

You can get a good idea by using DaVinci and the -sga or -sgb switches for the Scale class. Display only very simple programs or the display will be too complex.

Is there any documentation or examples that I could look at showing how to insert new sections of code and maintain the integrity of the CFG?

Unfortunately there is no such document. And, it is very important to keep the CFG information valid. There are three things that must be kept correct:
  1. The use-def links. This is the hardest and most important. PRE and ScalarReplacement do not do this fully as they both may generate multiple defs of the same variable. The other opts maintain the use-def links correctly. At some future time, we plan to make it easy to redo SSA for selected variables. Now, you may have to exit SSA form and re-enter it to make the use-def info correct.
  2. The variable reference information. One can simply call recomputeRefs() at the end of the optimization or one can update the information on the fly using
             refs.removeRefs(Chord s);
             s.recordRefs(References refs);
    
  3. The dominance information. If an optimization adds or deletes a CFG node, it should call scribble.recomputeDominators() before it terminates.
All of the optimizations in scale.score.trans show examples of keeping the CFG up to date. SCC, ValNum, PRE, LICM all insert new CFG nodes. It is very important to insert a NullChord before a CFG node that is deleted if the deleted CFG node has multiple in-coming CFG edges. This is due to a design flaw and is necessary to keep the Phi node information valid. The NullChords are deleted later. An example of this exists in SCC.

Can I exit SSA form and re-enter as another way of keeping the CFG valid?

Yes, this is a valid approach.

One area where you will have a problem is in the alias analysis. Alias variables must exist for all variables before entering SSA form.

Does the method ssa.removePhis() exit SSA? Would I then call ssa.buildSSA() again to recompute SSA form?

Yes. You would also have to decide if you should do ssa.coalesceVariables() to eliminate most SSA renamed variables. It would definitely be faster not to call this method and it is probably not necessary. However, if you do not call it after ssa.removePhis(), the names can never be coalesced. This may have an adverse effect on the backend generated code as there will be many more temporary variables. This may increase the register pressure and will definitely cause the register allocation to be slower.