EDA045F Program Analysis: Homework 2

Please try to track the time that you need for the homework assignment and report it, so that we can calibrate the effort for future classes.

Dataflow Analysis 2

This is a group project and intended to be worked on in a team of two. If you have no partner, you can also submit alone.

Background

In this excerise, you will again work with the Soot, though this time we will take advantage of Soot's built-in analysis frameworks.

Resources

Using Soot

In this exercise, you will use Soot as a library, rather than merely extending its frontend (as in the first exercise). That means that you need to programmatically enable / disable options; the stub source code points you to all the relevant options.

Debugging Soot

Below are some hacks for making it easier to debug Soot code that I have found to be useful. Feel free to suggest your own, and I will add them:

Submission

1. Finding Inefficient Code (2.5 points)

The Java Collections framework defines lists, sets, and maps, and a general supertype Collection for sets and lists. This allows programmers to use lists in code like the following:

    LinkedList<Integer> l = new LinkedList<>();
    ...
    f(l);

    void f(Collection<Integer> set) {
        for (int i ... ) {
            if (set.contains(i)) {
                ...
            }
        }
    }
  
Such code would usually be much more efficient if we were using a set type instead of LinkedList. In this exercise, you will write an analysis to help programmers find such `inappropriate' uses of lists.

Note that Java internally distinguishes between multiple kinds of method invocations that are all reflected in Jimple. Some of them (invokespecial and invokestatic) tell us precisely what code to call, while others (invokedynamic and invokevirtual) only give us a supertype of the callee and the name of the invoked method. Resolving these calls requires the use of call graphs.

TypeDescriptionExample
SpecialInvokeExpr Used for calling constructors, private methods, and super methods (inherited constructors and methods even if they are being overridden). new T();
StaticInvokeExpr Used for calling static methods (i.e., methods without a this): System.exit(0);
VirtualInvokeExpr Used for calling a `regular' public or protected method when the class or subclass type is known HashSet<Integer> hs = ...; hs.size();
InterfaceInvokeExpr Used for calling an interface method when the Java compiler only knows the interface type. Set<Integer> s = ...; s.size();
DynamicInvokeExpr Mainly intended for dynamically typed languages that run on the JVM, this instruction determines its invocation target by calling a helper method. Oracle's Java 8 compiler (though not JastAdd) also uses the instruction for constructing Lambda abstractions. Function f = i -> i - 1;
The types InterfaceInvokeExpr, SpecialInvokeExpr, VirtualInvokeExpr have the common supertype InstanceInvokeExpr, which may be convenient for this task.

2. Worst Case Execution Time (2.5 points)

Worst-case execution time analysis (WCET) is an important analysis in real-time and control systems engineering. This analysis determines the maximum amount of time that the execution of a certain subroutine will take, considering all possible control paths. (In practice, this also has stringent requirements on the hardware and the operating system).

Consider the following language:

    id  : identifiers
    nat : natural numbers

    val ::= nat | id | "true" | "false" ;

    expr ::= <value> + <value>
           | <value> - <value>
           | <value> == <value>
           | <value> < <value>
           | <value>
	   ;

    stmt ::= id "=" <expr> ";"
           | "skip" ";"
           | "twice" <stmt> ";"
           | "{"  <stmt>* "}"
           | "if" <expr> "then" <stmt> "else" <stmt>
           | "while" <expr> "do" <stmt> "done"
           | "return" <expr> ";"
  
Assume that the semantics of all expressions are analogous to Java, with support for boolean and integer types, and the skip instruction acting as a `no-op'. The semantics of twice s; are to execute the statement s twice in a row.

There are several approaches for computing worst-case execution time, but they all hinge on knowing how often each individual statement may be executed.