Optimising Compilers

 

for “optimising” read “ameliorating”

 

Optimisation

  • Analysis (determining whether a dangerous thing is 'safe')

  • Transformation (doing something dangerous)

 

Flowgraphs

  • graph representation of a program

  • each node is a 3-address instruction

  • each edge represents a potential control flow

 

Basic blocks

  • sequences of instructions which have

    • exactly one predecessor (except for the first one)

    • exactly one successor (except for the last one)

 

Analysis type

  • scope

    • within basic blocks: local/peephole

    • between basic blocks: global/intra-procedural

    • whole program: inter-procedural

  • type of information

    • control flow

    • data flow

 

Dead code

int f(int x, int y) {

int z = x * y; #waste of time to compute as value not used

return x + y;

}

  • does this data ever arrive anywhere?

 

Unreachable code

int f(int x, int y) {

return x + y;

int z = x * y; #waste of space as cannot be executed

}

  • may control ever reach here?

 

Safety of analysis

  • most interesting properties are undecidable as reducible to the halting problem

    • must approximate, on the side of safety

 

 

Address taken instructions:

i lab1: add r0, r1, r2

...

j mov t33, #&lab1

j+1 mov t34, #&lab2

j+2 mov t35, #&lab3

...

k jmpi t32

  • in the above case the address taken instructions are those at lab1, lab2 and lab3

 

Reachability analysis

  • Over-estimating reachability, assume

    • both branches of a conditional may be taken

    • loops may always terminate

    • any address taken variable may be branched to

      • e.g in above must assume can jump from k to lab1, lab2 or lab3

      • an effect of this is if unrestricted GOTOs are permitted then all instructions may be reached

  • algorithm

    • mark the procedure's entry node (basic block) as reachable

    • mark every successor of a marked node as reachable iteratively until no further changes are made

  • NB we cannot decide in general if the inside of

if ( f(x) ) { z = y + 1 }

    will be executed. However we can do it to an arbitrary level of decision, e.g. spotting

if ( !true ) { ... }

if ( false && ... ) { ... }

if ( x != x ) { ... }

while ( true ) { ... }

  •  
    • this is particularly useful for copy propagation

  • is an overestimation of reachability

 

Copy propagation

bool debug = false

...

if (debug) { int z = x * y }

  • in this case code might become trivially unreachable, if only we do the slightly more complicated looking for specific cases mentioned above

 

Straightening

  • this is the coalescing of blocks which, through other optimisations, have only one possible control flow, e.g.

 

 

 

 

 

 

 

 

 

Inter-procedural analysis

  • a similar procedure to unreachable code analysis can be done as unreachable procedure analysis

  • assume an indirect call can go to all address taken procedures

 

Other optimisations

  • remove

      if (f(x)) {}

    • assuming f has no side effects

  • loop simplification

    • unroll the loop so that there aren't any branches

    • can even, if the mathematics is finite, do it then simply make an assignment

 

Liveness

  • is the value of this variable needed?

  • analysis: each line has an associated set of live variables

  • semantic liveness

    • a variable is semantically live at a node n if there is some execution sequence starting at n whose behaviour can be can be affected by changing the value of x

    • is undecidable

  • syntactic liveness

    • a variable is syntactically live at a node if there is a path to the exit of the flowgraph along which its value may be used before it is redefined

    • is an overestimate of liveness

  • algorithm: live variable analysis

    • backwards algorithm

    • where

        defined(n) are the variables defined at n

        ref(n) are the variables referenced at n

    • actual algorithm

      • start with set of live variables as {} for all lines

      • go from line i = 1 to line n doing live(i) at each stage

      • stop when no more changes are made

  • assumptions

1 int x, y, z, t, *p;

2 x = 1, y = 2, z = 3;

3 p = &y;

4 if ( ... ) p = &x;

5 *p = 7; ref = {p} def = {}

6 if ( ... ) p = &y;

7 t = *p; ref = {p, x, y} def = {t}

8 print z+t;

  •  
    • all address taken variables are referenced

      • e.g. in 7 above, any variable which p could reference is assumed to be live

    • no variables are defined – underestimate ambiguous definitions

      • e.g. in 5 above since it is unclear to which of x or y that p points, only p is considered to be referenced

 

Expression availability

  • useful for finding redundant computation

  • semantic availability

    • an expression is semantically available at node n if its value gets computed (and not subsequently invalidated) along every execution sequence ending at n

  • syntactic availability

    • an expression is syntactically available at node n if its value gets computed (and not subsequently invalidated) along every path from the entry of the flowgraph to n

if ( b ) {x = y * z}

if ( !b ) {x = y * z}

return x + y; x + y is available but is not semantically available

  • underestimating availability

  • data flow equations

    • gen(n) is the set of expressions generated at node n

    • kill(n) is the set of expressions killed at node n, i.e.

where xi is a variable assigned to at n

is the set of all expressions in the program which contain xi

  •  
    • since the algorithm requires an expression to be available on all incoming paths

      • with the exception of when pred(n) = {}, since{} = Universe

        • hence define, in this case, avail(n) = {}

  • alternate data flow equations

    • gen(n) : a node generates an expression e

      • if it must computer the value of e and

      • does not subsequently redefine any variables occurring in e

    • kill(n) : a node kills an expression e if

      • it may redefine some of the variables occurring in e and

      • does not subsequently recompute the value of e

    • NB, gen and kill here must not intersect

  • algorithm

    • forwards algorithm

for i = 1 to n do avail[i] = U

while (avail[] changes) do

for i = 1 to n do

avail[i] =

  •  
    • can speed up if initialise avail[1] = {}

  • assumptions, due to address taken variables

1 int x = 1, y = 2, z = 3, t, *p = &y;

2 if ( ... ) p = &x;

3 t = *p + 2; gen = {} kill = {t}

4 *p = 4 gen = {} kill = Ex Ey

  •  
    • underestimate ambiguous generation

      • e.g. in 3 cannot let x+2 or y+2 be generated as don't know which one

    • overestimate ambiguous killing: assume all expressions containing address taken variables are killed

      • e.g. in 4, don't know which of x or y is being assigned to so kill expressions using either

Data flow anomalies

  • anomalies may break or worsen (slow) the program

    • compiler should warn when the program is broken

  • types of anomalies

    • dead code

    • uninitialised variables

      • live variable analysis can detect this

      • if the set of live variables at the start of a given scope contains any variables local to that scope then a value may be used before it is initialised

    • write-write anomalies

      • when a variable is written twice with no intervening read

      • do forwards LVA-like analysis

        • add variables which are defined

        • remove variables which are referenced

      • normal problems with conditional statements

 

Clash graphs

  • for generating intermediate code we can use as many registers as we want

  • but need to find out which of these virtual registers need to be allocated into separate real registers

  • use LVA to find out when variables are simultaneously live

  • e.g.

mov x, #11 {}

mov y, #13 {x}

add t1, x, y {x,y}

mul z, t1, #2 {t1}

mov a, #17 {z}

mov b, #19 {a, z}

mul t2, a, b {a, b, z}

add z, z, t2 {t2, z}

 

Register allocation

  • graph colouring: use to do register allocation across a clash graph

  • minimum colouring for a graph is NP-hard, hence use heuristic algorithm

  • algorithm

    • choose a vertex which has the least number of incident edges (clashes)

      • remove the vertex and put it on a LIFO stack

    • repeat until graph is empty

    • pop each vertex from the stack

      • colour it in the most conservative way

      • avoids the colours of its already coloured neighbours

 

  • spilling

    • if we run out of registers we have to choose some registers to spill to memory

    • algorithm

      • choose a vertex with the least number of incident edges (clashes)

      • if it has fewer edges than there are colours

        • remove the vertex and push it onto a stack

        • otherwise choose a register to spill and remove the vertex

      • repeat until graph is empty

      • pop each vertex from the stack and colour it

      • any uncoloured vertices must be spilled

    • the method for choosing which to spill can vary

      • e.g. the least accessed one

        • but that doesn't take account of accesses in a loop

        • could assume all accesses in a loop counts as four accesses...

    • NB spilling actually requires extra registers

      • may need to restart colouring with fewer registers

  • moves between registers

      e.g. mov a, b

    • if we can colour a and b the same then we can remove the mov instruction

    • use a preference graph to show which registers appear in mov instructions together

  • non-orthogonal instructions

    • some architectures will

      • have to have its arguments in certain registers

      • zero certain registers when performing certain operations

      • etc.

    • have a set of virtual registers (e.g. v0 ... v31) which correspond to the real registers (r0...r31)

    • in generating intermediate code

      • use new virtual registers (r32, r33 ...)

      • when using instructions requiring inputs in a certain register generate preceding mov to move the value from the virtual register where it lives to the correct virtual register

      • when using instructions which put results in a certain register generate trailing mov to move the value from the specific virtual register to the new one associated with its variable in the source code

      • e.g.

        add can only perform on t0 = r1 + r2

x = 19

y = 23

z = x + y

  •  
    •  

        goes to

mov v32, #19

mov v33, #23

mov v1, v32

mov v2, v33

add v0, v1, v2

mov v34, v0

  •  
    •  
      • mov's will be eliminated during register allocation using the preference graphs

      • when instruction n is going to corrupt a given physical register ri

        • place an edge between vi and all other virtual registers live at instruction n

 

  • procedure calling standards

    • the techniques for non-orthogonal instructions are useful for implementing procedure calling as it includes

      • arguments/returns have to be in certain registers

      • certain registers are corrupted or preserved

 

Redundancy elimination

  • common sub-expression elimination

    • if an expression e is available on entry to an instruction which computes e the redundant computation can be modified or removed

    • algorithm

      • find a node n which computes an already available expression e

        • replace the occurrence of e with a new temporary variable t

        • on each control path backwards from n find the first instruction calculating e and add a new instruction to store its value into t

      • repeat until no more redundancy is found

    • problems

      • have introduced more moves, speed will depend on characteristics of target architecture

      • the new variable may have increased register pressure and called spilling

  • copy propagation

    • scan forwards from an “x = y” and replace x with y, as long as neither x or y have not been modified

    • easy to do in normal form

    • is useful to reduce the extra registers that can be introduced by common sub-expression elimination

    • makes CSE practical

  • code motion, includes

    • common sub-expression elimination

    • code hoisting

      • moving duplicated expression computations to the same places and combining them

      • relies on “very busy” expressions

    • loop invariant code motion

while ( ... ) {

x = a + b;

...

}

  •  
    •  
      • depends on discovering which assignments may affect the value of a variable “reaching definitions”

      • if none of the variables in an expression are redefined in terms of non-invariant variables can pull the evaluation out of the loop

    • partial redundancy

while ( ... ) {

... = a + b;

...

... = a + b;

}

=> ... = a + b;

while ( ... ) {

...

... = a + b;

}

  •  
    •  
      • requires a complicated backwards/forwards

Live range

  • user variables are often reassigned and reused many times in a program

    • they are live in different places

  • however this can lead to virtual registers having a large live range, and causing clashes

  • live range splitting – use different virtual registers to store a variable's value at different times

 

Static single assignment

  • each virtual register is only assigned to once

    • end up with not having to do live range splitting

  • similar to normal form for temporary variables

  • algorithm

    • go through the code in control flow order

      • each time a variable is reassigned rename it with an incremented subscript (and rename all forward occurrences)

      • e.g.

v1 = 3

v2 = v1 + y

  •  
    • when two control flow edges meet several differently named variables must be merged together

      • use the Θ function to keep track of which variables need to be merged at some control flow point

      • the Θ function is not executable

      • it disappears at register colouring: it requires that two branches have a value in the same register

 

Phase ordering problem

  • it's difficult to decide in which order to perform optimisations

  • some things which superficially improve programs may make it worse for another optimisation

 

Strength reduction

  • this is about replacing expensive operations with equivalent less expensive ones

  • algebraic identities indicate which operations can be interchanged

 

stage

line

Change to

1

4

t += *(a + 4*i*N + 4*j)

2

3

for (l = 0; l < 4*N; l += 4)

2

4

t += *(a + 4*i*N + l)

3

2

for (k = a; k < (a + 4*M*N); k += k*N

3

4

t += *(k + l)

4

3

for (l = k; l < 4*N + k; l += 4)

4

4

t += *l

 

 

Abstract interpretation

  • reasoning statically we are wanting to determine dynamic (run time) behaviour of a program

  • we can't run the whole program at compile time, but can run a simplified version

  • e.g. a map

    • sacrifices much detail, but is useful for planning a route

    • route is )probably) still valid back in real life

    • imagine doing Dijkstra in real life in your car...

  • adding integers

    • abstraction

      • (-) = {| z < 0}

      • (0) = {0}

      • (+) = {| 0 < z}

      • (?) = Z

      • [+] is an abstract operator on the above abstract values

    • simple computation

[+]

(-)

(0)

(+)

(-)

(-)

(-)

(?)

(0)

(-)

(0)

(+)

(+)

(?)

(+)

(+)

  • safety

    • for the above abstraction to be safe it is necessary to permit uncertainty (the (?))

    • e.g. with (-1515 + 37)

      • abs(-1515 + 37) = (-)

      • abs(-1515) [+] abs(37) = (?)

      • safe as (-) (?)

  • formally

    • concrete domain D

      • e.g. P(Z)

    • abstract domain D#

    • abstraction function α: D → D#

    • concretisation function γ: D# → D

      • e.g. (-) → {| z < 0}

        (0) → {0}, etc.

    • a function f from one concrete domain to a another

      • e.g. +: D * D → D

    • an abstract function f # between the corresponding abstract domains, which is an abstraction of f

 

 

Strictness analysis

  • call by value evaluation

    • efficient in space and time

    • may evaluate more arguments than necessary

  • call by name evaluation

    • only evaluated arguments when necessary

    • evaluates them repeatedly

    • also requires taking closures, putting them on the heap...

  • call by need

    • like call by name but copies to result so no multiple evaluation

  • replacing call by name with call by value

    • (λs.42) Ω

      • CBN evaluates to 42

      • CBV doesn't terminate

    • hence should CBN when we can, but only when it's safe

  • neededness

    • it will be safe to use CBV instead of CBN when an argument will definitely be evaluated

    • an argument is needed by a function if the function always evaluates it

    • actually too conservative

  • strictness

    • a function is strict in an argument if the function fails to terminate whenever the argument fails too terminate

    • hence if a function is strict in an argument we can pass it by value

  • recursion equations

    •  

        Ai is a symbol representing a built in function of arity ri

    • e ::= xi | Ai(e1 , ... , er_i) | Fi(e1 , ... , ek_i)

      F1(x1 , ... , xk_1) = e1

      ...

      Fn(x1 , ... , xk_n) = en

      where

    • e.g.

        plus(x,y) = if x = 0 then y else plus(x-1, y+1)

        plus(x,y) = cond(eq(x,0), y, plus(sub1(x),add1(y)))

  • standard interpretation

    • we need a representation of non-termination in our concrete domain

      • D = { _|_ }u Z

    • each built in function needs a standard interpretation

        cond(_|_, x, y) = _|_

        cond(0, x, y) = y

        cond(p, x, y) = x otherwise

        eq(_|_, y) = _|_

        eq(x, _|_) = _|_

        eq(x, y) = (x=y) otherwise

    • standard interpretation of a user defined function is constructed from the built in functions by composition and recursion

      • since the user defined function is defined in a similar way by its own defining equation this is not too hard

 

  • abstract interpretation

    • the thing we're interested in is if an expression may terminate given termination information about its inputs, or if it will always not terminate

      • i.e. we're interested in whether the function is strict in each of its inputs

      • hence the abstract domain D# = {0,1}, where

        • 0 indicates non termination

        • 1 indicates possible termination

    • for each built in function Ai we need a corresponding abstract function ai# to indicate its strictness in each of its arguments

      • this is the strictness interpretation for Ai

      • e.g.

          eq#(0,y) = 0

          eq#(x,0) = 0

          eq#(1,1) = 1

      • we can hence use Karnaugh maps and produce optimised boolean expression for the strictness interpretation

    • hence all we need now is an abstract interpretation fi# for user defined function Fi

      • this tells us how an arbitrary function is strict in its arguments

        • that permits us to know where we can replace CBN with CBV

      • for example, we can reduce plus#(x,y) to

          plus#(x, y) = x /\ (y \/ plus#(x,y))

        • we want a definition of plus# which satisfies this

          • actually want the least fixed point

          • algorithm

            •  
              •  

                  f#[i] = λ x .ei#

              • for u = 1 to n do

            • for i = 1 to n do f#[i] = λ x .0

              while (f#[] changes) do

            where

            •  

                each Ai replaced with ai#

                each Fj replaced with f#[j]

            • ei# means ei (from the recursion equations) with

              x is a vector of inputs x1 ... xn

        • this eventually reduces to x /\ y

 

Ai : builtins

ai : Dr_i D

where

D = Z U { _|_ }

ai # : 2r_i→ 2

where

2 = {0, 1}

 

Fi : user defined

fi : Dr_i D

fi# is a safe approximation, and the one thing we need to find out

Constraint based analysis

  • in LVA we generate a series of equality constraints

    • in-live(m) = (out-live(m) \ def(m)) u ref(m)

    • out-live(m) = in-live(n) u in-live(o)

    • ...

  • 0CFA

    • zeroth-order control flow-control analysis

    • in the presence of certain language features (like indirect calls) it is non-trivial to predict how control may flow at execution time (i.e. construct call graphs)

    • when functions (or pointer to functions) are present then this provides information about which functions may be called from a given site

    • let's do it for functional languages

        e ::= x | c | λx . e | let x = e1 in e2 | e1 e2

    • program points

      • each node on the abstract syntax tree is numbered as a program point

      • flow values for this language are

        • integers

        • function closures (λ expressions)

        • e.g. right they are 710 and (λx4.x5)3

    • each point i was an associated flow variable αi

      • this represents the set of flow values which may be yielded at program point i during execution

      • the precise value is undecidable, so we generate a safe approximation

      • we can generate a set of constraints on the flow variables and iteratively compute their least solution

    • generating constraints

syntactic object

program point

constraint

constant

ca

α a{ca}

variable

λxb . (... xa ...)

α aα b

 

let xb = ... in ... xa ...

 α aα b

function

(λxa . eb)c

α c{(λxa . eb)c}

let

(let xa = eb in ec)d

 α dα c

 

 

 α aα b

conditional

(if ea then eb else ec)d

 α dα b

 

 

 α dα b

function application

(ea eb)c

a(λxp . eq)r ) => α pα b

 

 

 a(λxp . eq)r ) => α cα q

 

  •  
    •  
      • a note on function application

        • we make the assumption that α a(λxp . eq)r by construction using the fact that earlier we defined that there would be such an flow value in the flow variable for a lambda expression

 

  • 1FCA

    • 0CFA is monovariant: each expression has only one flow variable associated with it

    • hence multiple calls to the same function allow multiple values into the single flow variable for the function body

      • these values leak out at all potential call sites

      • this is because of the α a(λxp . eq)r ) => α pα b rule above

        • note that each time a function is applied to an expression, a constraint is added saying that the flow variable of the expression is included in the flow variable of the dummy variable of the expression

    • 1CFA has a separate flow variable for each call site in the program

    • this is called polyvariant

      • another attempt to rectify this is the polymorphic approach

        • the values themselves are enriched to support specialisation at difference call sites

 

Inference based analysis

  • inference systems are used to determine program properties by a set of rules

    • such properties of a program usually depend recursively on the properties of sub expressions

  • an inference system specifies judgements

    •  

        e is an expression

        Γ is a set of assumptions about the free variables of e

        Φ is a program property

    • Γ |- e : Φ

  • type systems

    • (need to be able to do basic type system for simple lambda calculus)

    • if a type system has shown that a program is well typed execution can proceed without having to tag and track the types at run time

      • that gives overhead

    • the safety condition for this type system is

        {} |- e : t => [[e]][[t]]

        [[e]] and [[t]] are the denotations of e and t

        i.e. if e is well typed to type t then the value [[e]] obtained by evaluating e is in the set [[t]] (the set of all values of type t )

  • odds and evens

    • Φ ::= odd | even | Φ1 Φ2

    • the “typing” rules are the same as normal

    • e.g. find the “type” for

        Γ = {2: even , add: even → even → even , multiply: even → odd → even}

        e = λx . λy . add (multiply 2 x) y

        Γ |- e : ?Φ

    • safety condition

        {} |- e : t => [[e]][[t]]

        [[odd]] = {zZ | z is odd}

        [[even]] = {zZ | z is even}

        [[Φ1Φ2]] = [[Φ1]] → [[Φ2]]

  • richer properties

    • could permit

        Γ = { ... multiply: even → even → even , multiply: odd → even → even , ... }

        Γ = { ... multiply: even → even → even /\ multiply: odd → even → even /\ ... , ... }

    • these would require richer inference systems to handle these richer properties

 

Effects systems

  • this is a form of inference based analysis to take heed of side effects

  • language

    •  

        ζ represents some channel name

        ζ? reads an int from channel ζ, binds it to x (in e) and returns the result of evaluating e

        ζ!x.e1 e2 evaluates e1, writes the resulting integer to channel ζ, and returns the result of evaluating e2

    • e ::= x | λx.e | e1 e2 | ζ?x.e | ζ!x.e1 e2

      where

  • the typing rules for these operations are simple

 

 

 

 

 

 

  • there are some optimisations we'd like to be able to do but side effects limit them

    • e.g. transforming

        t = f() + f()

      into

        x = f()

        t = x + x

      is only possible if f() has no side effects

  • an expression may have effects F, which is a set containing elements of the form

    • Rζ : a read from channel ζ

    • Wζ : a write from channel ζ

  • functions

    • these do not have immediate side effects, but delayed side effects which will occur if the function is evaluated

    • hence we modify the type system so that it is instead

        t ::= int | t1 F t2

    • e.g.

        λx.ζ!x.x : int { Wζ } int

  • the inference system is modified to

  •  

      Γ |- e : t, F

  • effects sub-typing

    • extending the language by


      means that

        if x then λx.ζ!3.x+1 else λx.x+2

      is not typeable as

        λx.ζ!3.x+1 : int →{Wζ } int

        λx.x+2 : int → int

    • we can solve this issue by again extending the language to permit


  • optimisation

    • referentially transparent: an expression with no immediate side effects

    • referentially transparent expressions may be replaced with another expression of the same type and value, with no changes to program semantics

      • e.g. they may be safely removed if LVA says they are dead

  • safety condition

      {} |- e : t, F => [[e]] = (v, f) . v[[t]] /\ fF

  • optional extra structure

    • lists/sequences instead of sets

      • if we kept the effects in a list rather than a set we could get information about in which order and how many times they occurred

      • ζ1! y . ζ2? x . ζ1! x . x F = <Wζ1 , Rζ2 , Wζ1>

      • if we were tracking file accesses it would be important to ensure no reads/writes occurred after closing the file

      • if we were tracking memory allocation we would want to ensure no block is allocated twice

 

Instruction scheduling

  • this is target code level (target processor specific) optimisations

  • single cycle implementations

    • i.e. not pipelined

    • clever reordering is unlikely to yield any benefits

  • pipelined implementations

    • each functional units works independently and does its job in a single clock cycle

      • since we still have one instruction finishing per cycle but the cycles are much shorter then hopefully the program will run faster

    • pipeline hazards

      • these prevent the next instruction from executing in the next clock cycle

        • prevents the pipeline to run at full capacity

      • interlocked hardware

        • hazard causes the pipeline to stall

      • non-interlocked hardware

        • nops must be inserted explicitly by the compiler, to avoid errors

 

  •  
    • data hazards

      • feed forwarding is used so that we don't have to wait for WB stage in order to use calculated values

      • data hazards from loads

        •  

            1 lw r1, #0 (r0)

            2 add r2, r2, r1

            3 lw r3, #4 (r0)

            4 add r4, r4, r3

        • in this case the EX stage of 2 must wait for the MEM stage of 1 to have completed

        • rearranging the instructions can eliminate data hazards:

            1 lw r1, #0 (r0)

            2 lw r3, #4 (r0)

            3 add r2, r2, r1

            4 add r4, r4, r3

 

 

 

  • instruction dependencies

    • types of data dependencies

      • read after write

      • write after read

      • write after write

    • in reordering we may not reverse the order of instructions when one is dependent upon another

    • we want an ordering which

      • preserves the data dependencies

      • reduces hazards

    • for a basic block, represent dependencies by a directed acyclic graph (DAG)

      • A→B indicates that instruction B has a data dependency upon instruction A

      • any topological sort of this DAG will maintain the dependencies

    • finding the sort which causes the fewest possible pipeline stalls is NP complete

    • heuristic alternative

      • maintain a candidate list to contain the minimal elements of the DAG

      • heuristic:

        • each time we're emitting the next instruction we should choose one which

          • does not conflict with the previous emitted instruction

          • instructions which, if they were in a conflict, are likely to be the first instruction, the instruction relied upon (i.e. pull load operations up)

          • is as far away as possible (along paths in the GAD) from an instruction which can validly be scheduled last

      • algorithm

        • while the candidate list is not empty:

          • if possible emit a candidate instruction satisfying all three of the static scheduling heuristics, above

          • if no instruction satisfies all the heuristics, emit one which satisfies the last two

          • remove the instruction from the DAG and insert the newly minimal elements into the candidates list

    • dynamic scheduling

      • someone still needs to understand the principles (now it's the hardware designers)

      • some embedded systems will turn off dynamic scheduling to save power

        • we should still implement it in the compiler

 

Register allocation vs. instruction scheduling

  • phase ordering problem: allocation and scheduling are in contention

 

 

 

 

 

 

 

 

 

 

 

 

 

  •  
    • NB that we have a reduced number of stalls if we use more registers (less aggressive register allocation)

    • option: instead of re-using registers at the earliest opportunity allocate them cyclically

      •  
        • this should not increase the probability of spilling registers to memory

          • as that 'same' register will still be available when we would have used the 'different' (new) one

        • this hopefully does increase the probability that registers are not reused so aggressively

        • effect on allocation algorithm

          • satisfy the usual constraints

          • see how many physical registers are left over

          • for each unallocated virtual register try to choose a physical register which is distinct from all others allocated in the same basic block

 

 

Decompilation

  • decompilation is just compilation from target to an abstract syntax (parse) tree

  • reverse engineering

    • the process of increasing the level of abstraction, making it less suitable for implementation, but more suitable for comprehension and modification

  • legality/ethics

    • entities generally consider source code to be confidential intellectual property

    • end user license agreements usually specify that in agreeing to use the software one may not decompile it

      • EU regulations say that one can decompile it in order to achieve interoperability (e.g. to find out what TCP/IP calls MSN messenger makes to interact with central servers), if one already has the right to use the program

      • similar exceptions in US law

    • is complicated, especially with blurred interfaces between geographical jurisdictions

    • other methods of reverse engineering besides decompilation

      • clean room design:

        • this is done by having someone examine the system to be reimplemented and having this person write a specification

        • this specification is then reviewed by a lawyer to ensure that no copyrighted material is included

        • the specification is then implemented by a team with no connection to the original examiners

  • simple decompilation discards a lot of information

    • comments

    • function and variable names

    • structured control flow

    • type information

  • optimising compilation discards more

    • dead code

    • common sub-expressions

    • algebraic expressions rewritten

    • code and data are inlined

    • loops are unrolled

    • unrelated local variables are allocated to the same register

    • instructions are reordered by

      • code motion optimisations

      • instruction scheduling

  • compilation is not injective

    • many source programs may result in the same compiled code

    • we want to pick a reasonable representative source program

  • intermediate code

    • basic blocks can be extracted from the target code

    • a flowgraph can be created from these basic blocks

    • converting individual target instructions back into intermediate code 3-address instructions can be troublesome

      • many target instructions set registers etc., and recreating this exactly is laborious

      • we may have some interesting substitutions

        • original

            beq var1, 0xFFFFFFFF, label1

        • target

            add var1, #1

            bz var1, label1 branch if var1 is zero (or could do on overflow flag)

  • control reconstructions

    • we can recover high level control structure by examining the control flow of the flowgraph

    • loops

      • high level loops will have been compiled down into “spaghetti flow” tests and branches

      • node m dominates node n if control must go through m before it can reach n

        • a dominance tree: an edge connects a node with its immediate dominator

        • see below and left a dominance tree superimposed on a flowgraph

 

 

 

 

 

 

 

 

 

  •  
    •  
      • a back edge is an edge in the flowgraph from n to m when m dominates n

        • each back edge has an associated loop (see above right)

        • the back edge points to the loop header

        • the loop body consists of all the nodes from which the tail of the back edge can be reached without passing through the loop header (again), i.e. everything which is reachable from the header whilst going to the tail of the back edge, without actually traversing the back edge itself

      • while (...) {...}

        • in this case the loop header contains a conditional

        • the tail unconditionally goes back to the header

      • do {...} while (...)

        • the header unconditionally allows the body to execute

        • the tail tests whether the loop should execute again

    • conditionals

      • if (...) then {...}

        • test transfers control flow if some condition is true

        • of not then transfer control to another node (which control also reaches from the other branch too)

      • if (...) then {...} else {...}

        • switch control between two nodes, and control always reaches some other third node

    • we can do this for whatever control flow constructs are in the language

    • one a section of the flowgraph has been identified as some structure we can replace the whole structure with a single node

      • i.e. start with the smallest structures

 

  • type reconstruction

    • at target level types barely exist

      • memory contains arbitrary bytes

      • registers contain integers and floating point values of various bit widths

    • SSA and register allocation make discovering types more difficult

      • SSA splits one user variables into many variables (one for each static assignment)

      • register allocation can

        • spread these variables between several registers

        • make a single register hold the value of different variables at different times

      • hence giving types to registers doesn't make sense

    • we can deal with this by converting the target code to SSA form, e.g.

 

 

 

 

  •  
    • type reconstruction is achieved using constraint based analysis

      • each target instruction generates constraints on the types of the registers

      • we solve these constraints to assign types at the source level

      • constraints generate at call sites (between procedures) can help fill in gaps