Lecture 5 – notes

This is a short description of what we did during lecture 5. I spent most of the time going through examples which I hope can be of help to you when you're about to solve lab 3 and the project – the slides are here.

Event sourcing

First we discussed ways to implement bank accounts – one simple way is to just keep the state of an account in a mutable attribute:

class Account {

  private int accountNumber;
  private double balance;

  public void deposit(double amount) {
    balance += amount;
  }

  public double withdraw(double amount) {
    var actual_amount = Math.min(amount, balance);
    balance -= actual_amount;
    return actual_amount;
  }

  public double balance() {
    return balance;
  }
}

This is conceptually very simple, but there are some drawbacks:

  • We know the balance of the account, but we don't know why it is the way it is.
  • To update the account, we need to make sure that no one else updates it at the same time, so we need to use some kind of lock.

Because of this, it's become commonplace to save state updates instead of the state itself – in the case of bank accounts we would save all transfers, and then calculate the balance by going through all transfers (we could also 'cache' the balance at some point in time, and then just look at transfers after it). This is called event sourcing, and it gives us the account's history, and allows us to explain why we have the current balance. It also makes it 'cheaper' to alter the state of the accounts, we only need to add new transfers (we'll talk more about this in a few weeks time).

In lab 2 you'll need to think about how to keep track of the number of available seats for a performance – there are several ways, one of them is using event sourcing.

About DBMS's and SQLite

We then looked a drawing illustrating how we can use a database server:

I said that a DBMS normally runs as a server, and we talked about different ways of communicating with our database server:

  • Using a notebook, such as the Jupyter notebooks we've used for lectures and lab 1.
  • Using some kind of dedicated graphical client (a GUI), such as sqlitebrowser – I showed it to you briefly.
  • Using a command line SQL-client, such as psql for PostgreSQL, or sqlite3 for SQLite (SQLite doesn't really run as a server, but when we run the command line client, it looks like it).
  • Using a program written in a traditional language, such as Java, or Python.
  • Using some kind of microservice (typically REST) – this is just an example of the previous category, but it's very common, and they're typically used as backends for various web frontends – we'll return to this on Thursday (and on lab 3 and the project).

Using SQLite from a command line client

After the lecture I got some questions about what programs I was using, and it made me realize that some of you are not used to working with a command line interpreter (CLI) – I'd be happy to show you, and answer any question about it during the QA sessions.

On the screen for this part of the lecture I had two windows:

  • To the left I had a text editor (you can use any good text editor for writing SQL and Python code, I think VS Code is a safe bet – personally I use Emacs, and it's great, but it comes with some big caveats).

    In the text editor I edited the SQL script (setup.sql), and the Python programs (list-applications.py, inflate-grades.py and admit.py).

  • To the right I had a window in which I ran a command line interpreter (CLI) – it is a program which reads commands and executes programs (this program is often called a shell, and there are many to choose from, and the're very similar to each other – if you're running Windows you can use PowerShell, on Mac you can use whatever runs in your Terminal program, and if you're on Linux (as I am), you can use bash, zsh, fish, or whatever).

I started the SQLite 'client' by issuing the command sqlite3 in my CLI (the rightmost window) – first I did it without giving any filename:

$ sqlite3
SQLite version 3.49.1 2025-02-18 13:38:58
Enter ".help" for usage hints.
Connected to a transient in-memory database.
Use ".open FILENAME" to reopen on a persistent database.
sqlite>

Changes we do here will just be saved to an in-memory database, and we would lose all our data when we exited (unless we explicitly save it, see below). To use a specific file to save our database, we just enter it as we start sqlite3:

$ sqlite3 colleges.sqlite
SQLite version 3.49.1 2025-02-18 13:38:58
Enter ".help" for usage hints.
sqlite>

Here we can do two different kinds of things:

  • We can enter SQL commands – these are the common suspects, SELECT, INSERT, UPDATE, CREATE TABLE, etc.
  • We can enter SQLite commands – these are commands for reading from files, saving to files, etc, they're all prefixed by a ., so to read a database from the file colleges.sqlite (from lecture 2), we write (sqlite> is just the prompt):

    sqlite> .open colleges.sqlite
    

    There are many useful sqlite3 commands, during the lecture I used:

    • .schema, which shows how our tables are defined
    • .print, which prints a message when we run a script
    • .mode, which alters the output format of our queries (we tried columns, boxes and html)
    • .drop (see below)

During lecture 4, we looked at an example with college applications, they are in the tables students, colleges, majors, programs, and applications, and we can use the sqlite3 command .dump to see SQL statements which would define the same tables, and the data in them:

$ sqlite3 colleges.sqlite
SQLite version 3.49.1 2025-02-18 13:38:58
Enter ".help" for usage hints.
sqlite> .dump
PRAGMA foreign_keys=OFF;
BEGIN TRANSACTION;
CREATE TABLE students (
  s_id         INT,
  s_name       TEXT,
  gpa          REAL,
  PRIMARY KEY  (s_id)
);
INSERT INTO students VALUES(123,'Amy',3.899999999999999912);
INSERT INTO students VALUES(234,'Bob',3.600000000000000088);
INSERT INTO students VALUES(345,'Craig',3.5);
INSERT INTO students VALUES(456,'Doris',3.899999999999999912);
INSERT INTO students VALUES(543,'Craig',3.399999999999999912);
INSERT INTO students VALUES(567,'Edward',2.899999999999999912);
INSERT INTO students VALUES(654,'Amy',3.899999999999999912);
INSERT INTO students VALUES(678,'Fay',3.799999999999999823);
INSERT INTO students VALUES(765,'Jay',2.899999999999999912);
INSERT INTO students VALUES(789,'Gary',3.399999999999999912);
INSERT INTO students VALUES(876,'Irene',3.899999999999999912);
INSERT INTO students VALUES(987,'Helen',3.700000000000000177);
CREATE TABLE colleges (
  c_name       TEXT,
  state        TEXT,
  enrollment   INT,
  PRIMARY KEY  (c_name)
);
INSERT INTO colleges VALUES('Stanford','CA',15000);
INSERT INTO colleges VALUES('Berkeley','CA',36000);
INSERT INTO colleges VALUES('MIT','MA',10000);
INSERT INTO colleges VALUES('Cornell','NY',21000);
CREATE TABLE majors (
  major        TEXT,
  full_name    TEXT,
  credits      TEXT,
  PRIMARY KEY  (major)
);
INSERT INTO majors VALUES('CS','Computer Science','300');
INSERT INTO majors VALUES('EE','Electrical Engineering','300');
INSERT INTO majors VALUES('bioengineering','Bioengineering','300');
INSERT INTO majors VALUES('biology','Biology','300');
INSERT INTO majors VALUES('history','History','300');
INSERT INTO majors VALUES('marine biology','Marine Biology','300');
INSERT INTO majors VALUES('psychology','Psychology','300');
CREATE TABLE programs (
  major        TEXT,
  c_name       TEXT,
  enrollment   INT,
  PRIMARY KEY  (major, c_name),
  FOREIGN KEY  (major) REFERENCES majors(major),
  FOREIGN KEY  (c_name) REFERENCES colleges(c_name)
);
INSERT INTO programs VALUES('CS','Berkeley',6);
INSERT INTO programs VALUES('CS','Cornell',3);
INSERT INTO programs VALUES('CS','MIT',5);
INSERT INTO programs VALUES('CS','Stanford',6);
INSERT INTO programs VALUES('EE','Cornell',3);
INSERT INTO programs VALUES('EE','Stanford',5);
INSERT INTO programs VALUES('bioengineering','Cornell',5);
INSERT INTO programs VALUES('bioengineering','MIT',3);
INSERT INTO programs VALUES('biology','Berkeley',5);
INSERT INTO programs VALUES('biology','MIT',3);
INSERT INTO programs VALUES('history','Cornell',5);
INSERT INTO programs VALUES('history','Stanford',6);
INSERT INTO programs VALUES('marine biology','MIT',5);
INSERT INTO programs VALUES('psychology','Cornell',3);
CREATE TABLE applications (
  s_id         INT,
  c_name       TEXT,
  major        TEXT,
  decision     CHAR DEFAULT ('N'),
  PRIMARY KEY  (s_id, c_name, major),
  FOREIGN KEY  (s_id) REFERENCES students(s_id),
  FOREIGN KEY  (major, c_name) REFERENCES programs(major, c_name)
);
INSERT INTO applications VALUES(123,'Stanford','CS','Y');
INSERT INTO applications VALUES(123,'Stanford','EE','N');
INSERT INTO applications VALUES(123,'Berkeley','CS','Y');
INSERT INTO applications VALUES(123,'Cornell','EE','Y');
INSERT INTO applications VALUES(234,'Berkeley','biology','N');
INSERT INTO applications VALUES(345,'MIT','bioengineering','Y');
INSERT INTO applications VALUES(345,'Cornell','bioengineering','N');
INSERT INTO applications VALUES(345,'Cornell','CS','Y');
INSERT INTO applications VALUES(345,'Cornell','EE','N');
INSERT INTO applications VALUES(678,'Stanford','history','Y');
INSERT INTO applications VALUES(987,'Stanford','CS','Y');
INSERT INTO applications VALUES(987,'Berkeley','CS','Y');
INSERT INTO applications VALUES(876,'Stanford','CS','N');
INSERT INTO applications VALUES(876,'MIT','biology','Y');
INSERT INTO applications VALUES(876,'MIT','marine biology','N');
INSERT INTO applications VALUES(765,'Stanford','history','Y');
INSERT INTO applications VALUES(765,'Cornell','history','N');
INSERT INTO applications VALUES(765,'Cornell','psychology','Y');
INSERT INTO applications VALUES(543,'MIT','CS','N');
COMMIT;
sqlite>

We saw that we can also run SQL commands and SQLite commands given as command line parametrs – this enabled me to write:

$ sqlite3 colleges.sqlite ".dump" > create-college.sql

which created a simple textfile create-college.sql with the SQL statements above (the > is a redirection command, which takes the output of one command and saves it in a text file).

Given this file we could create a new database, with only the five tables above (now using <, which sends the content of a text file as input to another command):

$ sqlite3 colleges.sqlite < create-colleges.sql

and we could then see all the students:

$ sqlite3 colleges.sqlite
SQLite version 3.49.1 2025-02-18 13:38:58
Enter ".help" for usage hints.
sqlite> SELECT * FROM students;
s_id  s_name  gpa
----  ------  ---
123   Amy     3.9
234   Bob     3.6
345   Craig   3.5
456   Doris   3.9
543   Craig   3.4
567   Edward  2.9
654   Amy     3.9
678   Fay     3.8
765   Jay     2.9
789   Gary    3.4
876   Irene   3.9
987   Helen   3.7
sqlite>

We then used the text editor (to the left of the screen) to write a script which showed the names and grades for all students who had applied for Computer Science at Stanford.

I wrote it in a text file called list-applications.sql, and the SQL code was:

SELECT    s_id, s_name, gpa
FROM      applications
          JOIN students USING (s_id)
WHERE     major = 'CS' AND c_name = 'Stanford'
ORDER BY  random();

I ran this by issuing the command sqlite3 colleges.sqlite < list-students.sql in the CLI window (on the right side of the screen):

$ sqlite3 colleges.sqlite < queries.sql
┌──────┬────────┐
│ s_id │ s_name │
├──────┼────────┤
│ 123  │ Amy    │
│ 876  │ Irene  │
│ 987  │ Helen  │
└──────┴────────┘

In lab 2 you'll have to create your own database file, and then sqlite3 will be invaluable.

Calling SQLite from Python

SQL is great for doing queries, inserts and updates of a database, but it's highly specialized for those purposes – it would be a nightmare to write traditional programs using SQL.

So we want a way to write code in some other language, and have it use SQL for what it's good for – in this course we'll use Python, which has great support for calling SQL.

In Python's standard library, there is a package called sqlite3, and it allows us to create a Connection to a database, and then creating a Cursor working with the database (sending queries and updates).

To use a SQLite3 database colleges.sqlite in Python we can import the sqlite3 library, and just write:

import sqlite3

db = sqlite3.connect('colleges.sqlite')

Now db is connected to our database, and we can create a cursor in which we can execute SQL statements using the execute command:

c = db.cursor()

We wrote a Python program which took a major and a college name as input, and then ran the same query as we saw above:

import sqlite3
import sys


db = sqlite3.connect('colleges.sqlite')


def main():
    major, college = sys.argv[1:]
    c = db.cursor()
    c.execute(
        '''
        SELECT    s_id, s_name, gpa
        FROM      applications
                  JOIN students USING (s_id)
        WHERE     major = ? AND c_name = ?
        ORDER BY  random();
        ''',
        [major, college]
    )
    for id, name, gpa in c:
        print(f'  {id}: {name} ({gpa})')


main()

(I wanted to introduce the idea of randomness in the output, as a preparation for an upcoming example, that's why we wrote ORDER BY random() in the query).

The question marks in the query are values we send into our query, and they are put in a list as the second argument to the c.execute() method.

The result of the c.execute() statement is that the cursor c will be an iterator over tuples with the results of the SELECT statement – in this case we get 3-tuples with ids, names, and gpas, and we can unpack them with (as above):

for id, name, gpa in c:
    print(f'  {id}: {name} ({gpa})')

We ran it as (I'm using uv to run my Python programs, but you can use any Python installation you want):

$ uv run list-applications.py CS Stanford
987: Helen (3.7)
123: Amy (3.9)
876: Irene (3.9)

At least I think it's hard to imagine a simpler way to call SQL code from another language!

As an example of updating the database, we wrote a small program which inflates the grades of students applying to programs in a given state. The program is meant to run as:

$ uv run inflate-grades.py CA

So, we let the user enter the state in question as a command line argument:

import sqlite3
import sys


db = sqlite3.connect('colleges.sqlite')


def main():
    state = sys.argv[1]
    c = db.cursor()
    c.execute(
        '''
        UPDATE   students
        SET      gpa = min(4.0, gpa * 1.04)
        WHERE    s_id IN (
            SELECT  s_id
            FROM    applications
                    JOIN colleges USING (c_name)
            WHERE   state = ?
        )
        ''',
        [state]
    )
    db.commit()


main()

In the previous program, we just wanted to fetch data from our database, but this time we want to update it – to do that we need to do a commit on our connection, so we call db.commit() (it's the end of a transaction, and we'll talk about transactions in a couple of weeks time).

Near the end of the lecture we started to write a program which lets administrators admit students into programs. We first fetched the maximum enrollment and the number of admitted students:

import sqlite3
import sys


db = sqlite3.connect('colleges.sqlite')


def main():
    major, college = sys.argv[1:]
    c = db.cursor()
    c.execute(
        '''
        SELECT    enrollment
        FROM      programs
        WHERE     major = ? AND c_name = ?
        ''',
        [major, college]
    )
    enrollment, = c.fetchone()
    c.execute(
        '''
        SELECT     count()
        FROM       applications
        WHERE      major = ? AND c_name = ?
                   AND decision = 'Y'
        ''',
        [major, college]
    )
    admitted, = c.fetchone()
    print(f'  {admitted} of {enrollment} already admitted')


main()

I was asked if we really needed two separate queries for doing this, and I said that there is a 'cleverer' way of doing it using CTE:s and a CROSS JOIN – I promised to show it in these notes, so here we go:

import sqlite3
import sys


db = sqlite3.connect('colleges.sqlite')


def main():
    major, college = sys.argv[1:]
    c = db.cursor()
    c.execute(
          '''
          WITH
             enrollments(enrollment) AS (
                SELECT    enrollment
                FROM      programs
                WHERE     major = ? AND c_name = ?
             ),
             admissions(admitted) AS (
                SELECT    count()
                FROM      applications
                WHERE     major = ? AND c_name = ? AND
                          decision = 'Y'
            )
          SELECT   enrollment, admitted
          FROM     enrollments
                   CROSS JOIN admissions;
          ''',
          [major, college, major, college]
      )
    enrollment, admitted = c.fetchone()
    print(f'  {admitted} of {enrollment} already admitted')


main()

Observe that we need to repeat the major and c_name in the c.execute call, since they're needed in each of the CTE:s (there are four question marks). Also observe that each of the CTE:s return just a single row, so the cross join is just one row, with the combined results of the subqueries.

We then continued by listing the students on the waitlist, and came up with:

c.execute(
    '''
    SELECT    s_id,
              s_name,
              gpa,
              rank() OVER (ORDER BY gpa DESC)
    FROM      applications
              JOIN students USING (s_id)
    WHERE     major = ? AND c_name = ?
              AND decision = 'N'
    ''',
    [major, college]
)
for id, name, gpa, rank in c:
    print(f'  {rank:3d} {gpa:4.2f} - {id} ({name})')

We discussed whether to use rank() or row_number() in our window function, in the exercise below I'll argue that rank() could be the most logical choice.


Exercise: At this point, I decided to leave the implementation of the remainder of the program as an exercise until Thursday – we'll finish it at the start of the lecture, and I'd encourage you to have a try on your own before Thursday.

What we want is for the program to list all students on the waitlist as above, and to then ask how many of the students to admit. The program should then update the applications table, and admit the top candidates on the waitlist (i.e., those with the highest gpa). In case there is a tie (several with the same gpa), I think the program should randomly pick some of the applicants (that's why I introduced the random() function above) – before we do that, it seems reasonable to have all tied students listed with the same 'rank' (so rank() would be better than row_number() in the output above).