Package scale.score.dependence

Provides dependence testing of array references in the CFG.

See:
          Description

Class Summary
AffineExpr A class to represent affine expressions used by the data dependence tester.
DataDependence The base class for computing array data dependences.
DDEdge This class is the base class for data dependence edges.
DDGraph This class represents the data dependence graph.
DDInfo A class which represents data dependence information between two array references.
DDNormalEdge This class represents a set of dependence edges from one source to one sink in the data dependence graph.
DDTransEdge This class represents the set of edges, in the data dependence graph, that have a distance of 0 and the same direction for some array in some loop.
 

Package scale.score.dependence Description

Provides dependence testing of array references in the CFG.

This is a description of how we include alias information into our SSA representation.

Our baseline technique for including alias information in SSA form is described in the paper by Chow et. al. It helps to understand this paper to understand our extensions. First wesummarize Chow's solution to incorporating alias information in SSA form . Then, we will discuss our extensions.

The main contribution by Chow et. al. is the use of virtual variables to represent indirect operations which unifies the treatment of indirect and scalar variables for SSA analyses and optimizations. The treatment of aliases is based upon the work by Choi et. al. which distinguishes between MayDefs (preserving definitions or weak updates) and MayUses. Chow et. al. add Chi and Mu operators to model MayDef and MayUse effects, respectively. The Chi and Mu operators work on virtual variables rather than scalar variables. Thus, a definition (Chi) of a virtual variable is a definition of a group of actual variables that share similar alias characteristics. Correspondingly, a Mu operator represents a use.

A virtual variable represents a group of program, or user, variables that have similar alias characteristics. We originally used Steensgaard's alias analysis algorithm to create these groups which we also call abstract locations (abstract location and virtual variable can be used interchangeably). For example, in the sequence, p = &a; p = &b, a virtual variable is created to represent a, b, and *p. This means that, conservatively, any use of *p may use a or b. A very conservative approach may create one virtual variable for every user variable.

Our basic algorithm works as follows:

  1. Perform alias analysis to identify the virtual variables
  2. Perform a pass over the program and insert Chi and Mu nodes for the virtual variables.
  3. Convert to SSA with the virtual variables
In step 2, we insert a Mu node for each use of an aliased user variable. We insert a Chi node for each definition of an aliased user variable. We also insert a Chi at function calls for each actual aliased actual parameter. Using our small example from above, if we create a virtual variable, _v, to represent a, b, and *p then we add a Mu(_v)/Chi(_v) to each use/definition of a, b, or *p. Note - the Chi(_v) is actually an assignment of the form _v = Chi(_v).

When the CFG is converted into SSA form, the virtual variables as as well as the user variables are renamed. This means that Phi nodes are created for the virtual variables and the virtual variables obey the single assignment rule of SSA form.

The scheme as described above works because each indirect variable (e.g., *p) represents at most one virtual variable (i.e., p points to at most one abstract location). Unfortunately, the scheme does not work when p may point to multiple locations. In the Shapiro/Horwitz algorithm, p may point to multiple locations. We need to extend our alias SSA representation to handle the Shapiro analysis.

Again, using the example from above, let's say the Shapiro algorithm partitions the user variables such that p->{a} or p->{b} (i.e., p points to a or to b, but a and b are not aliases). This is different from the Steensgaard algorithm in which p->{a, b} (i.e., a and b are aliases). To incorporate the Shapiro results into SSA form, we need a different scheme to represent the MayUse and MayDefs. For example, a use/def of a is not a use/def of b, but a use/def of *p may be a use/def of a or b.

We extend the basic scheme for including alias information by changing the meaning of a virtual variable. In our new scheme, a virtual variable represents multiple locations and we attach information to the variable to distinguish between the locations. We assign a superset and subset name to the virtual variable. Essentially, the superset name represents indirect variables (e.g., *p), and the subset name represents user variables. In our example, we can represent the relations among the variables using set notation, where a virtual variable represents the set {*p, {a}, {b}}. This means, a and b belong to distinct subsets but they are members of the same enclosing set which includes *p.

In theory, our extended algorithm for adding alias information to SSA isn't much different that the original.

  1. Perform alias analysis to identify the virtual variables
  2. Perform a pass over the program and insert Chi and Mu nodes for the virtual variables.
  3. Convert to SSA with the virtual variables
In Step 1, we create virtual variables using the new name scheme. The number of subset names depends upon the number of categories used in the Shapiro pointer analysis. In Step 2, inserting the Chi and Mu nodes is slightly different. We insert a Chi/Mu for a superset virtual variable for a definition/use of an indirect variable. We insert a Chi/Mu for a subset virtual variable for a definition/use of a user variable. In Step 3, we also convert a CFG to SSA, but we ignore the subset names for the purposes of creating the the MayDef/Use links. The subset names must be preserved for the optimizations to make use of the more precise alias information.

Note that the MayDef/Use links (and SSA numbers) ignore the subset name. But, the optimizations must check for the subset names when propagating information. Analysis information should only be propagated when the virtual variables are equivalent. Two virtual variables are equivalent under the following conditions: 1) The subset and superset names are the same, 2) One virtual variable only contains a superset name and both the superset names are the same.

The superset and subset distinctions are intended to help the optimization phases. We only want to propagate information for equivalent virtual variables. That is, when propagation information associated with subset name, we ignore different subset names but we must perform an appropriate action for a virtual variable with the same subset name or superset name. This is the main difference between the old scheme and the new scheme. In the old scheme, analysis information is always propagated down may def/use links. But, in the new scheme, we selectively propagate analysis information down the may def/use links.

Using our example, we create 3 virtual variables, _v#p, _v#p.a, and _v#p.b (our implementation actually uses numbers, so it's _v#0.0, _v#0.1 and _v#0.2, but we use the letters to hopefully make things clearer. The description of constant propagation (CP) below is not meant to match the algorithm in Scale. Instead, we just attempt to illustrate how an optimization phase should use our SSA extensions.

We use the following program to illustrate the extensions (given that p->{a}, or p->{b}.

  a = 1;
  b = 2;
    = *p;
    = b;
    = a;
Using Steensgaard analysis (or Shapiro's analysis with 1 category), we cannot propagate 1 to the use of a or 2 to the use of b. But, with Shapiro's analysis with multiple categories we can. We annotate the program with the Chi and Mu information after converting to SSA form
  a = 1;  _v#p.a_1 = Chi(_v#p.a)
  b = 2;  _v#p.b_2 = Chi(_v#p.b_1)
    = *p; Mu(_v#p_2)
    = b;  Mu(_v#p.b_2)
    = a;  Mu(_v#p.a_2)
In the example, CP associates 1 with _v#p.a_1 and _v#p.b_1 after statement 1. The may def-use link for _v#p.a_1 points to the use of _v#p.b_1 at statement 2. Since the virtual variable at statement 2 is not equivalent to the virtual variable at statement 1, CP does not propagate the value 1 to _v#p.b_1. Instead, CP associates 2 with _v#p.b_2 and _v#p_2. Then, CP follows the may def-use links to the Mu nodes at statements 3, 4, 5. At statement 3, _v#p_2 is 2 so the use of *p can be replaced by 2. The same action occurs at statement 4 for b. At statement 4, a can be replaced by 1.

If there is control flow in the program:

  if ()
    a = 1; _v#p.a_1 = Chi(_v#p.a)
  else
    b = 2; _v#p.b_2 = Chi(_v#p.b_1)
         _v#p_3 = Phi(_v#p_1,_v#p_2)
   = *p;   Mu(_v#p_3)
   = b;    Mu(_v#p.b_3)
   = a;    Mu(_v#p.a_3)
In this case, CP associates not a constant with _v#p_3, _v#p.a_3, and _v#p.b_3 at the Phi node. Then, none of the uses can be replaced by a constant.

Note - in the presence of control flow, we only need to add Phi functions for superset names because the definition of a superset is equivalent to definitions for each of the subsets.