EDAN26 Multicore Programming Lab 6

The course home page is here.

The purpose of this lab is to let you explore:
  1. Download the file lab6.zip or use:

    wget fileadmin.cs.lth.se/cs/Education/edan26/labs/lab6/lab6.zip

  2. Go to the downloaded directory. First an optional activity for users of the nvi and vim editors:

  3. Edit the file src/main.rs and run make (in the current directory, not in src)

    You will get a compilation error (and some warnings about unused variables). What does the error mean?

  4. Fix this! See beginning of Lecture 9 for hints (and possibly end of Lecture 8).

    Make again.

  5. Make the program complete, i.e. you get output PASS

  6. Experiment with some larger inputs.

  7. Now go the clojure directory and type

    time clojure swish.clj

    It is only the first line of output from time with real that is interesting (and relatively correct :)

    Initially only four accounts and ten transactions, using one thread, is used.

    At the end all accounts are printed and you can see they all have the start-balance.

    Fix this! See Lecture 11 for hints.

  8. Then note the time and increase the number of accounts in steps of e.g. 1000 and note if takes more time to create the accounts.

  9. Then increase the number of transactions until it takes noticably longer time.

  10. Then increase the number of threads until it no longer is faster.

  11. Then decrease the number of accounts until it takes noticably longer time. Will it take ''forever'' if you have really few accounts?

  12. Next go to the C directory, which contains the original C file from Lab 3. You might want to edit your Pthreads-solution from Lab 3 instead. If you use the solution from Lab 3, make sure your account structs only contains the balance and no mutex (to save memory).

  13. Now modify the swish function to work on a transaction, using the syntax

    __transaction_atomic { /* code... */ }

  14. Compile with gcc -fgnu-tm swish.c, i.e. without optimization!

  15. Unfortunately, you will see a compilation error. Only ''transaction safe'' functions may be called from a transaction.

    Change the code to:

    void __attribute__((transaction_safe)) extra_processing() { volatile int i; for (i = 0; i < PROCESSING; i += 1) ; }

    This tells GCC it is safe to call the function. Recompile!

  16. Unfortunately, you will see another compilation error. In C a volatile variable is regarded as ''dangerous to optimize'' by compilers, since instead of being a normal variable, accessing it may lead to side effects such as I/O.

    It is clear this cannot be tolerated in a transaction which may be retried.

    Remove the volatile flag and recompile!

  17. When it compiles successfully, use the following command to check the POWER transactional memory instructions really are used:

    objdump -d a.out | grep tbegin

  18. Experiment with varying the number of accounts, threads, and transactions. How many transactions per second can you achieve?

  19. Compare the performance with your solution from Lab 3.

  20. Now compile using optimization, for example -O3 which is the highest level for GCC.

  21. Increase the amount of PROCESSING until it becomes absurd. How is the performance affected? Why, do you think?

  22. Next compile the program mm.c which performs matrix multiplication, and try GCC and CLANG and different optimization levels.

    gcc -fexpensive-optimizations -mcpu=power8 -O3 mm.c && time ./a.out

    and then

    gcc -ftree-parallelize-loops=80 -fexpensive-optimizations -mcpu=power8 -O3 mm.c && time ./a.out

    Which tries to use 80 threads in parallelizable loops.

  23. Parallelize it using OpenMP! See Lecture 8 for hints.

  24. Finally, use the IBM parallelizing compiler without OpenMP directives. HOT means high-order transformations.

    xlc -qarch=pwr8 -O5 -qsmp -qhot=level=2 mm.c && time ./a.out

    As warned by xlc the accuracy can be reduced with -O3 and higher. Sometimes that is dangerous and at other times waiting ten days instead of one for an ocean weather prediction may be more dangerous.

  25. Optional: can you modify the OpenMP code to make it faster than the IBM code? Use any compiler and options.

Sat Oct 12 15:10:30 CEST 2019