EDA045F Program Analysis: Homework 4

Please try to track the time that you need for the homework assignment and report it, so that we can calibrate the effort for future classes.

Dynamic Analysis

This is a group project and intended to be worked on in a team of two. If you have no partner, you can also submit alone.

Background

Resources

[Tasks 1,3] The DaCapo fop benchmark: [fop.tar.gz]
[Task 1] AspectJ downloads: (official site ) (AspectJ ships with many operating systems / IDEs, so check first if you already have a local distribution that you can use.)
[Task 1] AspectJ programming guide: [html] (external)
[Task 1] AspectJ quick reference: [pdf] (external)
[Task 1] AspectJ thisJoinPoint reference: [html] (external)
[Task 2] Matrix multiplication code in C: [matrix-multiply.c]
[Task 2] Statistical plots in R: box plots, violin plots
[Task 2] Statistical plots in Python (matplotlib): box plots, violin plots
[Task 2] Perf tool documentation: perf examples and documentation by Brendan Gregg
[Task 3] Helper tools for working on fop with Soot: [soot-fop-start.tar.gz]
[Task 3] Compressed fop callgraph, computed by Soot's SPARK callgraph analysis: [fop-callgraph-spark.csv.gz]
[Task 3] Soot 3.1.0: [jar]
[Task 3] Soot 3.1.0 packaged documentation: [jar]
[Task 3] Soot documentation: [HTML] (local copy)
[Task 3] Fundamental Soot objects: [HTML]
[Task 3] FOP method lookup table: [fop-method-lookup-table.csv.gz] (new)

Submission

Deadline: Wednesday the 27th of November (end-of-day)
Submission mechanism: Moodle

1. Profiling Instrumentation with AspectJ (1.5 points)

While some developers use AspectJ for aspect-oriented software development, another use of the system is in instrumentation. In this task, you will work with AspectJ to instrument the well-known benchmark fop. Note that the fop.tar.gz package comes with a bash script run-fop.sh that runs fop in the same style as in the DaCapo benchmark suite; this script should be sufficient for you to execute the program and measure its execution time (e.g., using the command line tool time).

Note that the run-fop.sh script takes an optional parameter, which is a classpath. You can use this to pass in a modified jar file (plus the AspectJ runtime support library aspectjrt.jar). To perform static AspectJ weaving, you can use the following (given a file MyInstrumentation.aj that you have to provide):

    ajc -warn:none -Xlint:ignore MyInstrumentation.aj  -outjar fop-instr.jar -inpath ./fop-all.jar

The extra parameters suppress warnings and errors due to references to dependencies on external libraries from unused code.

Note: Depending on your installation of ajc, you may have to increase the amount of heap space available to the program. For example, here is the ajc script that I use (note the -Xmx8G, which increases the heap space):

#!/bin/sh

AJPATH=/usr/share/java/aspectjrt.jar:/usr/share/java/aspectjtools.jar

exec "${JAVACMD:=java}" -classpath "$AJPATH${CLASSPATH:+:$CLASSPATH}" \
  "${JAVA_OPTS:=-Xmx8G}" \
  org.aspectj.tools.ajc.Main "$@"

Task:
1. Instrument fop with AspectJ to compute which methods take up how much time.
  - You can use System.nanoTime() to get the current system time.
  - Time usage should be exclusive: if method f calls g, then the time spent in g should not be counted against f.
  - fop is single-threaded, so you can safely use global variables.
  - If you run out of stack space during execution, make sure that this recursion isn't due to your instrumentation recursively calling further instrumentation.
  - To print your output, you may find it helpful to introduce a new main entry point that calls the original fop entry script.
  - Validate that the total time consumption of all methods is lower than the total run time.
2. Measure the runtime overhead of your instrumentation, following the evaluation methodology discussed in class. By what fraction does your execution time increase?
3. How did you measure the execution time? Explain.
4. Generate a file table.csv: What are the top 100 methods, by time consumption? List them in a table consisting of the method's signature (obtained via thisJoinPointStaticPart.getSignature()) followed by the tabulator character ("\t") followed by the total number of nanoseconds (in decreasing order).
5. Submit the following:
  1. Your AspectJ source code (extended with sampling)
  2. Answers to the questions listed above.
  3. The generated table.csv file.

2. Hardware Performance Counters (1.5 point)

On most modern operating systems, performance counters can be accessed programmatically (e.g., using PAPI on Linux). On Linux, we can even use a helper command line program, called perf, to analyse performance counters without having to write our own code. Such performance counters can report on operating system statistics (network / paging / harddisk activity), but also on CPU statistics (number of CPU instructions retired, number of cycles executed etc).

Task:
1. Assume that you have been asked to analyse the performance of a matrix multiplication function. You are given an implementation of the function and its helpers (matrix_power and all functions called from there) that you are not expected to modify together with a benchmarks that a software engineer put together in an attempt to help you.
2. Download the matrix multiplication benchmark matrix-multiply.c
3. Compile the benchmark with a recent version of gcc at different optimisation levels (-O0, -O1, -O2, -O3).
4. Analyse the execution time of the matrix multiplication by re-running the program at least 10 times and analysing the data using box plots or other statistical significance methods that you know.
5. Explain: (note that there may be multiple correct answers, as long as they are consistent)
  1. How did you measure execution time?
  2. What computations did you explicitly exclude/include in the measurements?
  3. Why?
6. Use the Linux perf program to better understand how the benchmark uses memory (NB: you may have to run your code with superuser privileges): measure the number of Level 1 data cache read accesses (loads) for the different optimisation levels
  1. At which optimisation level increases do the number of memory loads change? Use a plot to examine if the changes are significant (e.g., whiskers of the box plots don't overlap).
  2. Write 3-4 sentences to reflect on your observations. (You are not expected to know gcc's optimisations or your CPU's instruction set; just share insights and questions that come to your mind based on what you have measured.)
7. Submit the following (with plots as png, pdf, or eps files):
  1. Explanations to the questions listed above
  2. Execution time of the program, given different optimisation settings
  3. Level 1 data cache usage of the program to support your analysis

3. Call-Graph Instrumentation with Soot (2 points)

The static call graph computations that we have examined so far have been attempts to approximate the actual program call graph. In this exercise, we compute the dynamic call graph of a program (our fop benchmark from Task 1) and compare it against a statically computed callgraph.

Soot can be instructed to transform Java code by using the -outjar option and writing a suitable BodyTransformer. The compiled transformer can then be executed e.g. by running:

    java Stub4 -w -cp ./fop-all.jar:./xlog.jar:/opt/jdk1.8.0_144/jre/lib/rt.jar -process-dir ./fop-all.jar  -outjar -allow-phantom-refs

This will produce the file sootOutput/out.jar by processing fop-all.jar. (Note that the output jar will not include auxiliary data such as graphics and some system class files stored in fop-all.jar). The produced output jar file will then have been transformed as per the instructions in Stub4.

Task:
1. Make sure to install fop again, if needed, and download and extract the `Helper tools for working on fop with Soot'. The package contains:
  - Stub4.java, a helper file / starting point for implementing the dynamic points-to analysis.
  - sootfiltered-fop.tar.gz, which contains all the contents of fop that Soot filters out when processing fop-all.jar
  - fix-fop.sh, a shell script that merges a generated out.jar with sootfiltered-fop.tar.gz. These last two files are purely for your convenience, and you can manually add the files that Soot strips from fop-all.jar to the generated out.jar, if you prefer.
2. Download, decompress and briefly examine fop-callgraph-spark.csv.gz. This is the static callgraph computed by Soot's SPARK framework, for fop. In the following, we will examine how accurate this call graph is.
3. Edit Stub4.java so that all invokevirtual, invokeinterface, invokespecial, and invokestatic calls (the program does not use invokedynamic) are recorded in the same style as for the static callgraph (see Stub4.java for details).
4. Use Soot with your instrumenter to produce an updated out.jar file. Merge this file with the soot-filtered data from sootfiltered-fop.tar.gz (cf. fix-fop.sh) to produce a modified runnable fop benchmark.
5. Run the modified fop benchmark to produce a dynamic callgraph.
6. Answer the following questions:
  1. How many call edges are missing in the dynamic call graph, compared to the static call graph?
  2. How many call edges are missing in the static call graph, compared to the dynamic call graph? List all of them in a file missing-static.csv, one per line.
  3. Build a file conservative.csv consisting of tab-separated triples (callsite, nuber-of-predicted-dynamic-callees, number-of-dynamically-observed-callees) for all callsites where the number of dynamically observed callees is at least 2 lower than the number of statically predicted callees.
  4. What fraction of call sites listed in the static call graph were never exercised in the dynamic call graph?
  5. What fraction of call sites observed in the dynamic call graph is represented by conservative.csv?
  6. Write 4-5 sentences on your impression of the precision of the static call graph, based on the above numbers.
7. Submit the following:
  1. Your Java code
  2. Your answers to the above questions, including generated csv files.
Hints:
- You may want to test your code on a smaller program (e.g., on one of your tests from an earlier exercise).
- Remember that calls may come both from InvokeStmt and from AssignStmt. Cover both cases.
- invokestatic and invokespecial are not very interesting, so you can filter them out-- but make sure to filter them out of both call graphs if you do so.
- You can use Soufflé for analysing the csv files, though many other languages (such as Python) also support convenient tab-separated csv processing.
- Look up the comments in Stub4.java. If you follow the advice there to split the processing of dynamic calls into a separate helper library (consisting of a single class with a single static method), you can often avoid re-running Soot while you fix bugs in dynamic call graph generation.
- new: You can use fop-method-lookup-table.csv.gz (after extracting it) to look up methods. The table has the format (class, methodname, Soot-target, Soot-methodname), meaning that if the dynamic type is class and the static name of the method is methodname (using java.lang.reflect method naming) or Soot-methodname (using Soot naming), the actually invoked method will be Soot-target (using Soot method naming).