EDA045F Program Analysis: Homework 3

Please try to track the time that you need for the homework assignment and report it, so that we can calibrate the effort for future classes.

Doop, Datalog, and Points-To Analysis

This is a group project and intended to be worked on in a team of two. If you have no partner, you can also submit alone.

Background

In this excerise, you will work with the Soufflé language, which is a dialect of Datalog. You can refer to the slides of lecture 6 for more information about Datalog. In addition, you will use the Doop framework for pointer anlaysis.

Resources

Building and Using Soufflé with our dataset

You can build Soufflé by running:

    ./configure && make -j4
  
(where the parameter to `-j' is intended to speed up compiling Soufflé itself through parallelisation; you can tune it for your system. This does not affect the execution speed for Datalog code.)

You can now run Soufflé as ./src/souffle, which provides suitable help. Alternatively, you can install Soufflé into some directory dir by running

    ./configure --prefix=dir && make -j4 && make install
  

If you have extracted our dataset into the directory javaccfacts, you can now run a Soufflé datalog program p.dl on our dataset via

    souffle -F javaccfacts p.dl
  
By default, Soufflé reads from and writes to the current working directory, so you can omit the -F javaccfacts parameters if you are calling it from within the facts directory.

You can use the -j T parameter to run Soufflé in multi-threaded mode with T threads.

Building and Using the Doop Fact Generator

To build the Doop Fact Generator, git clone https://bitbucket.org/yanniss/doop and then run ./doop from within the directory you just created. Doop will automatically recompile itself (and download dependencies via the gradle build tool) as needed.

While the Doop main program can automatically invoke Soufflé for analysis, we will only be using it to generate facts and then run Soufflé `by hand' (and only in the last task; facts for all other tasks can be downloaded above).

For reference, I used the following invocation to obtain the fact files that you found in javaccfacts.tar.gz using the following invocation:

  ./doop -Xstop-at-facts output/javaccfacts -l '/usr/lib/jvm/java-8-openjdk-amd64/jre/lib/rt.jar 2.exercise/junit4.jar' -i 2.exercise/javacc-5.0.jar --main 'jjdoc jjtree javacc org.javacc.jjtree.Main org.javacc.utils.JavaFileGenerator org.javacc.jjdoc.JJDocMain org.javacc.parser.Main'
  

Submission

1. The Type Hierarchy (1 point)

In this exercise, we examine the type hierarchy of javacc. The data has already been extracted by the Doop framework, so you only have to download and extract the javaccfacts.tar.gz archive and prepare Soufflé in order to get started. Recall that Soufflé has a notion of input relations that load data from an external file. The following setup allows you to load the type relations extracted from Java, assuming that you have previously declared DirectSuperclass and DirectSuperInterface as binary relations:

.input DirectSuperclass(IO="file", filename="DirectSuperclass.facts", delimiter="\t")
.input DirectSuperinterface(IO="file", filename="DirectSuperinterface.facts", delimiter="\t")
  
These direct supertype relationships describe each class' superclass, and each class' declared superinterfaces.

2. Callgraph analysis (1 point)

We continue our exploration of Soufflé, but this time we use the small datasets. We perform a points-to analysis that uses a version of the Doop points-to framework. For this exercise, we build on the definitions from minidoop.dl.

We will be utilising Doop's relations for our analysis. Note that Doop does not currently support invokedynamic, which in turn is not needed for representing our data set. Furthermore, its representation merges invokevirtual and invokeinterface, meaning that our program analysis only needs to consider three different forms of invocations:

TypeDescriptionExample
SpecialMethodInvocation Used for calling constructors, private methods, and super methods (inherited constructors and methods even if they are being overridden). new T();
StaticMethodInvocation Used for calling static methods (i.e., methods without a this): System.exit(0);
VirtualMethodInvocation Used for calling a `regular' public or protected method when the class or subclass type is known, or for an interface method HashSet<Integer> hs = ...; hs.size();

3. Object-Sensitive Points-to analysis (2 points)

We again use the small dataset.

Object-sensitive points-to analysis is a form of points-to analysis that distinguishes all derived information (call graph, reachability, ...) by the object that was the most recent receiver of a method call. For example, consider the following code:

    Class A {
        Object x;
        Object y;

        int f(Object xarg) {
            A aux = new A();
            aux.g(xarg);
            y = aux.y; // COPY
        }
        int g(Object yarg) {
            x = yarg;
        }

        static void entrypoint() {
            A a1 = new A();
            A a2 = new A();
            A a1alias = a1;
            a1.f("foo");
            a2.f("bar");
            a1alias.f();
        }
    }
  
Here, a1.f() and a2.f() would be analysed in different contexts, but a1.f() and a1alias.f() would be analysed in the same context. Since the calls to g() all go to the object allocated into variable aux, any contextual knowledge will get lost (assuming that heap objects are identified by allocation site, as in minidoop.dl). Thus, the analysis would determine (precisely) that a1.x can be "foo", a2.x can be "bar", and aux.y can be "foo" or "bar". This in turn means that the analysis would be imprecise in the line marked // COPY and conclude that both a1.y and a2.y could be "foo" or "bar".

4. Doop fact extraction (1 point)

Pick one additional Java program with suitable main entry points; you can refer to the examples from the previous exercise, use a small program of your own, or choose an existing executable Java program. Use Doop to obtain the facts for this program, and repeat the previous task.