Package scale.score.trans

This package provides various optimizations that operate on the CFG.

See:
          Description

Class Summary
AASR This class replaces array element address calculations in a loop with simple additions.
BasicBlockOps Perform optimizations on each basic block.
CP Perform copy propagation optimization on a Scribble graph.
DeadVarElimination This class performs dead variable elimination.
ExprMap Map from an expression to another expression.
GlobalVarReplacement This class replaces references to global variables with references to local variables.
Inlining This class performs inlining.
LICM This transformation moves CFG nodes to the outer-most basic block in which they may be executed correctly.
LoopPermute This optimization permutes the loops of a routine based on cache order.
LoopTrans The base class for all loop transformation optimizations.
Noopt This class performs no optimization.
Optimization This class is the base class for all optimizations performed on the CFG.
PRE Perform the Partial Redundancy Elimination optimization.
ScalarReplacement This class replaces references to array elements with references to scalar variables.
SCC This class performs the sparse conditional constant propagation.
SFIR This class replaces references to fields of C structures with references to local variables.
TreeHeight Perform expression tree height reduction on a Scribble graph.
URJ This class performs unroll & jam.
UselessCopy This class removes useless copies from the CFG.
ValNum Global value numbering optimization.
 

Package scale.score.trans Description

This package provides various optimizations that operate on the CFG.

Does the Optimization Improve the Program

When an optimization considers performing some transformation, there are two questions that must be answered:

  1. Can the transformation be performed? That is, if the transformation is made will the transformed program be correct. In this context a correct program is one that performs as it is coded.
  2. Should the transformation be performed? This is a harder question to answer because the answer depends on how well the transformed program performs. Does it execute faster? Does it require less space?
We will call question one the legal question and question two the beneficial question.

Because the interactions between optimizations can be very complex, some transformations may not appear to be benneficial but may enable an optimization that is very beneficial. One transformation may also inhibit a more beneficial transformation.

Most transformations result in an increase in register pressure. If the register pressure, for some sequence of generated instructions, becomes too great, register spilling results. Spilling a register may negate any benefit the transformation may otherwise have provided. If the register pressure becomes large enough, the transformed program may perform worse than the un-optimized program.

Answering question 2 optimally in any particular context is very difficult. Heuristics are usually used. Many of the Scale optimizations are written to assume that the answer to question two is always the same as the answer to question one; if it can be done, it will be done.

How to Add a New CFG Optimization

To add a new optimization to be performed on the CFG representation of a user function do:
  1. Create the Java class and place it in the scale/score/trans directory. This class must be derived from the Optimization class. Note that the Inlining optimization is a special case because it is performed over all functions at the same time. Therefore, it is not derived from the Optimization class.
  2. Make sure that your optimization defines the requiresSSA() method correctly. This method informs the compiler of the required condition for the CFG in order for your optimization to be performed.
  3. Make sure your optimization either leaves the CFG in the same state it obtained or executes the invalidSSAForm() method.
  4. Modify the Scale class to add the information needed to find and invoke your optimization.
    1. Choose a unique alphabetic letter to represent your optimization.
    2. Add that letter to the optimizationLetter field.
    3. Add the name of your optimization to the optimizationName field in the same position as you added your letter in the previous step.
    4. Add the name of your optimization class to the optimizationClass field in the same order.
    5. Decide if and in what order your letter should be added to the cannedOpts field. The elements of this array correspond to the five optimization levels (i.e., -O0, -O1, -O2, -O3, -O4).

Notes

SSA Form and Optimizations

Most optimizations are performed on the CFG when it is in SSA form. SSA form provides variable liveness information that is not available otherwise. However, some optimizations (e.g., GlobalVarReplacement, SFIR) require non-SSA form because they are not able to easily keep the SSA form valid. Some optimizations don't care either way (e.g., TreeHeight, BasicBlockOps).

The SSA form of the CFG is valid if two conditions are met:

  1. There is one and only one def of a variable that is in SSA form. (URJ sometimes violates this rule.
  2. The lexicographical order of the defs of the renamed SSA variables is maintained. (PRE violates this rule.)
The compiler is designed to allow the user of the compiler to be able to specify the order in which the optimizations are performed. For this reason, each optimization is required to specify in what form it requires the CFG to be in and what form it leaves the CFG in.

The requiresSSA() specifies in what form the CFG is required.

NO_SSA
The optimization requires that the CFG not be in SSA form.
IN_SSA
The optimization requires that the CFG be in SSA form.
VALID_SSA
The optimization requires that the CFG be in valid SSA form.
NA_SSA
The optimization does not require SSA form or non-SSA form.
The compiler assumes that an optimization leaves the CFG in the same SSA state as it received it originally. However, an optimization can specify that it made the SSA form invalid (violated one or both of the two requirements listed above) by calling the invalidSSAForm() method.

Spanning the CFG

Most optimizations have to visit every CFG node. Scale provides several ways of spanning the CFG and there are examples in every optimization. The easiest method is to use a visit pattern. For example,
(1)    Stack wl = WorkArea.getStack("xxx");
(2)    Chord.nextVisit();
(3)    wl.push(begin);
(4)    begin.setVisited();

(5)    while (!wl.isEmpty()) { 
(6)      Chord c = (Chord) wl.pop(); 
(7)      c.pushOutCfgEdges(wl);
(8)      ...
       }

(9)    WorkArea.returnStack(wl);
  1. Obtain a Stack to be used as a work list. We use our own version of the stack implementation to avoid the overhead of synchronization. We use the WorkArea class to obtain previously used stacks in order to avoid allocating new stacks and to avoid the allocations required to grow the stack. A unique identification is supplied so that if the code fails to return the stack (see (9)), it is possible to determine what place in the compiler obtained it.
  2. Since the CFG is a cyclic graph, the spanning algorithm must know if a node has been visited. This statement assigns a unique integer value to the spanning. If a node has been colored with this value, it has been visited.
  3. Start with the first node of the CFG. All CFG have one node that has no predecessors. The getBegin() method can be used to obtain this CFG node.
  4. Mark this node as visited. A node is marked as visited as it is placed on the work list.
  5. If the work list is empty, the CFG has been spanned.
  6. Remove a node from the work list.
  7. Place all of this node's successors that have not been visited on the worklist. See pushOutCfgEdges().
  8. Do the processing required for this node.
  9. Return the stack so it can be used again.
It is possible to span the CFG in other orders and there are examples of this through out the compiler. An example is that no CFG node be processed before all of it's immediate predecessors have been processed. See pushChordWhenReady().

An important thing to remember about the above code example is that there is only one color that can be assigned to a CFG node. Thus, if the processing in step 8 must also span the CFG, the two spanning algorithms will collide. To overcome this problem, a different spanning algorithm can be used for one of them.

    Stack   wl   = WorkArea.getStack("xxx");
    HashSet done = WorkArea.getSet("xxx");

    wl.push(begin);
    while (!wl.empty()) {
      Chord c = (Chord) wl.pop();
      c.pushOutCfgEdges(wl, done);
      ...
    }

    WorkArea.returnStack(wl);
    WorkArea.returnSet(done);
In this example a set of visited nodes is used in place of coloring each node.

Dominators

Various optimizations require information on dominators. The compiler computes this information the first time it is requested. (See getDomination().) If an optimization changes the CFG by adding or deleting a CFG node, then the optimization must either:

Dominance Frontier

Various optimizations require information on the dominance frontier of a node. The compiler computes this information the first time it is requested. (See getDominanceFrontier().) If an optimization changes the CFG by changing the dominance frontier of a CFG node, then the optimization must tell the compiler that the dominance frontier information is no longer valid by calling recomputeDominators().

References

Various optimizations require knowing where in the CFG each variable is referenced. The compiler computes this information the first time it is requested. (See getRefs().) The References class provides various methods for accessing and updating this information. An optimization can force the compiler to recompute this information as well by calling the recomputeRefs() method.

Loops

The compiler maintains a significant amount of information concerning each loop in the CFG. The compiler computes this information the first time it is requested. An optimization that changes the loop structure of the CFG (e.g., loop fusion, loop permutation} must inform the compiler that this information is no longer valid by calling the recomputeLoops() method.

Partial Redundancy Elimination (PRE)

PRE doesn't do much. And, it's expensive. It requires an additional constraint on the SSA form that is only valid prior to the operation of any other optimization. For that reason, Scale leaves SSA form and re-enters it if PRE is used after some other optimization. PRE is not under regression test.

Global value numbering is much more effective.

FAQ

Some optimizations don't seem to be effective on Fortran code.

The problem may be due to the fact that Fortran passes arguments by reference. If an index variable is passed to a subroutine, it causes some optimizations to skip that variable. Also, if the DO loop is controlled by variables passed in as an arguments, Scale may fail to find important loop information.