Example from lecture 5

This is a short description of what I was doing during lecture 5. I spent most of the time going through an example which I think could help you when you're about to solve lab 3.

About DBMS's and SQLite

The main topic of the lecture was JDBC, the software layer which makes it possible to connect to a SQL database from Java code (or, for that matter, any language running on the JVM). One of the first slides showed a schematic picture of some different ways of connecting to a database:

Using an SQL-client (the picture shows the MySQL client, which was used in the course the last couple of years, this year we're using SQLite instead)
Using a Java desktop application (not much different from doing it in any language, but uses JDBC)
Using a web server

There is nothing special about the connection between a web server and the DBMS, if the server runs on the JVM (as, e.g., Jetty), it would in principle be precisely the same kind of connection as from the Java deskop application (and nothing says that what's called 'Client' in the picture couldn't be running on the JVM as well).

(Text on blue plates, such as this, is optional reading, you can skip it with good conscience, but I got some questions during the intermission which made add it):

This course is about databases, not web technology, but I'd like to point out that it's becoming increasingly popular to structure software around microservices, which are small servers listening for requests (typically JSON) using a simple API (often RESTful). Although they often use http(s) for transfer, they are much smaller and more nimble than a traditional web server.

Microservices are typically written in languages such as Java, C#, Ruby or Python, but interesting new languages and platforms such as Elixir/Phoenix, Scala/Akka, and Haskell/Yesod are cropping up. Some of these frameworks use powerful database libraries, such as Slick (Scala) and Ecto (Elixir).

The front-ends to the microservices are usually written in Javascript, or some version of it (CoffeeScript, TypeScript, ...). It has also become popular to write the front-ends in languages which compiles to Javascript, some examples are Elm, GHCJS (Haskell) and Scala.JS.

Using SQLite's command line client

As we have seen before, SQLite (which is what runs when we're using our Jupyter notebooks) differs from traditional DBMS's in that it's not a dedicated server, but a library linked into our program. It still has a command line client, called sqlite3, and using it is like using clients like psql (PostgreSQL) or mysql (MySQL).

I started the SQLite 'client' by issuing the command:

$ sqlite3
SQLite version 3.13.0 2016-05-18 10:57:30
Enter ".help" for usage hints.
Connected to a transient in-memory database.
Use ".open FILENAME" to reopen on a persistent database.
sqlite>

Here we can do two things:

Enter SQL commands -- these are the common suspects, SELECT, CREATE, etc.
Enter SQLite commands -- these are commands for reading from files, saving to files, etc, they're all prefixed by a ., so to read a database from the file sample.db, we write (sqlite> is just the prompt):
```
sqlite> .open sample.db
```

During lectures 2 and 3, we looked at an example involving college applications (it was taken from Jennifer Widom), the definition of the tables for that example is given in the file Schema.sql:

DROP TABLE IF EXISTS colleges;
CREATE TABLE colleges (
  c_name       TEXT,
  state        TEXT,
  enrollment   INT,
  PRIMARY KEY  (c_name)
);

DROP TABLE IF EXISTS students;
CREATE TABLE students (
  s_id         INT,
  s_name       TEXT,
  gpa          REAL,
  size_hs      INT,
  PRIMARY KEY  (s_id)
);

DROP TABLE IF EXISTS applications;
CREATE TABLE applications (
  s_id         INT,
  c_name       TEXT,
  major        TEXT,
  decision     TEXT,
  PRIMARY KEY  (s_id, c_name, major),
  FOREIGN KEY  (s_id) REFERENCES students(s_id),
  FOREIGN KEY  (c_name) REFERENCES colleges(c_name)
);

And the data we've used is in the file Data.sql:

DELETE FROM students;
DELETE FROM colleges;
DELETE FROM applications;

INSERT
INTO   students(s_id, s_name, gpa, size_hs)
VALUES (123, 'Amy', 3.9, 1000),
       (234, 'Bob', 3.6, 1500),
       (345, 'Craig', 3.5, 500),
       (456, 'Doris', 3.9, 1000),
       (567, 'Edward', 2.9, 2000),
       (678, 'Fay', 3.8, 200),
       (789, 'Gary', 3.4, 800),
       (987, 'Helen', 3.7, 800),
       (876, 'Irene', 3.9, 400),
       (765, 'Jay', 2.9, 1500),
       (654, 'Amy', 3.9, 1000),
       (543, 'Craig', 3.4, 2000);

INSERT
INTO   colleges(c_name, state, enrollment)
VALUES ('Stanford', 'CA', 15000),
       ('Berkeley', 'CA', 36000),
       ('MIT', 'MA', 10000),
       ('Cornell', 'NY', 21000);
        
INSERT
INTO   applications(s_id, c_name, major, decision)
VALUES (123, 'Stanford', 'CS', 'Y'),
       (123, 'Stanford', 'EE', 'N'),
       (123, 'Berkeley', 'CS', 'Y'),
       (123, 'Cornell', 'EE', 'Y'),
       (234, 'Berkeley', 'biology', 'N'),
       (345, 'MIT', 'bioengineering', 'Y'),
       (345, 'Cornell', 'bioengineering', 'N'),
       (345, 'Cornell', 'CS', 'Y'),
       (345, 'Cornell', 'EE', 'N'),
       (678, 'Stanford', 'history', 'Y'),
       (987, 'Stanford', 'CS', 'Y'),
       (987, 'Berkeley', 'CS', 'Y'),
       (876, 'Stanford', 'CS', 'N'),
       (876, 'MIT', 'biology', 'Y'),
       (876, 'MIT', 'marine biology', 'N'),
       (765, 'Stanford', 'history', 'Y'),
       (765, 'Cornell', 'history', 'N'),
       (765, 'Cornell', 'psychology', 'Y'),
       (543, 'MIT', 'CS', 'N'),
       (321, 'MIT', 'history', 'N'),
       (321, 'MIT', 'psychology', 'N');

We can get this into a SQLite database in various ways, for example:

We can start SQLite, and do it from inside:

$ sqlite3
...
sqlite> .tables
sqlite> .read Schema.sql
sqlite> .tables
applications  colleges      students
sqlite> .schema students
CREATE TABLE students (
  s_id         INT,
  s_name       TEXT,
  gpa          REAL,
  size_hs      INT,
  PRIMARY KEY  (s_id)
);
sqlite> SELECT COUNT() FROM students;
0
sqlite> .read Data.sql
sqlite> SELECT COUNT() FROM students;
12
sqlite> .save sample.db
sqlite> ...

We can call SQLite using a filename and redirect input to make it read from our .sql-files:
```
sqlite3 sample.db < Schema.sql
sqlite3 sample.db < Data.sql
```
This is the way I prefer (since it's easier to script), but YMMV.

Using the client, we listed all students:

sqlite> SELECT s_id, s_name, gpa
   ...> FROM   students;
123|Amy|3.9
234|Bob|3.6
345|Craig|3.5
456|Doris|3.9
567|Edward|2.9
678|Fay|3.8
789|Gary|3.4
987|Helen|3.7
876|Irene|3.9
765|Jay|2.9
654|Amy|3.9
543|Craig|3.4
sqlite> ...

There are ways to make the output nicer, if you type .help in the client, you can find out how it's done.

JDBC by example

The code I wrote can be downloaded here -- to try the program out, first create the SQLite database (see above), and then compile and run the code (see below).

To compile and run the code, we can use some build system, such as Gradle, or let some IDE, like IntelliJ or Eclipse handle it -- I used emacs and a very primitive Makefile (actually, my use of it was a bit like playing baseball with a Stradivarius, but I wanted to keep things as simple as possible).

The code is in the directory app, and we can compile it with:

javac app/*.java

This is easy to type, but we don't want to type it every time, so I put it as a very simple Makefile target (just type make compile to compile).

The code uses jdbc, and it is imported as described in the lecture notes. Since java.sql is included in the standard Java library, we need no extra incantations to compile our code, but to run it we need to add a .jar-file with SQLite's driver to our CLASSPATH, we can do it with (if we're still in the directory above the source code):

java -cp ".:sqlite-jdbc.jar" app.Main

This is basically what happened when I typed make run.

Connecting to a SQLite database

In lab 3 you will get the skeleton for a database class -- I used exactly the same skeleton during the lecture. The first lines of the class is show below:

import java.sql.*;

public class Database {

    /**
     * The database connection.
     */
    private Connection conn;

    /**
     * Create the database interface object. Connection to the
     * database is performed later.
     */
    public Database() {
        conn = null;
    }

    /**
     * Open a connection to the database, using the specified user
     * name and password.
     */
    public boolean openConnection(String filename) {
        try {
            Class.forName("org.sqlite.JDBC");
            conn = DriverManager.getConnection("jdbc:sqlite:" + filename);
        } catch (SQLException e) {
            e.printStackTrace();
            return false;
        } catch (ClassNotFoundException e) {
            e.printStackTrace();
            return false;
        }
        return true;
    }
    ...
}

It is the openConnection-method which sets up the connection to our database. If we had used a traditional database server (such as PostgreSQL or MariaDB), we would have to provide a url and username/password for our server, when we use SQLite things are much simpler, we just provide the path to the database file.

So, the two critical lines here are

            Class.forName("org.sqlite.JDBC");
            conn = DriverManager.getConnection("jdbc:sqlite:" + filename);

The first of these lines may look absolutely pointless, but makes sure that the JVM has seen the SQLite JDBC driver -- the second line creates a Connection-object linked to our database file. We can now create Statement- or PreparedStatement-objects from our Connection, and from them we can run our queries and updates (using executeQuery or executeUpdate, see below).

Our first query

We first wanted to list all students in our database -- it's easily done with:

SELECT s_id, s_name, gpa, size_hs
FROM   students

but to do it from Java-code, we need to do few things:

We need to write the actual JDBC-code to make a query. Many things can go wrong when we make calls to our database, so we're going to need a try-catch-statement, and we need to catch SQLExceptions.

Then we need a way to communicate the result of the query to the rest of our program -- we want the interface between our Database class and the rest of the program to be as simple as possible, so I defined a class StudentInfo which contains the attributes of the tuples returned from the query -- we'll see how I created the class in the next step below.

In this first task, we used a simple Statement to send our query, and we create the statement using our connection. We did it like this:
```
Statement s = conn.createStatement();
```
After that, we can get a ResultSet object with all the found tuples using a call to executeQuery on our Statement-object -- the whole method is seen here:
```
public List<StudentInfo> studentInfo() {
    List<StudentInfo> found = new LinkedList<>();
    try {
        String sql =
            "SELECT s_id, s_name, gpa, size_hs\n" +
            "FROM   students";
        Statement s = conn.createStatement();
        ResultSet rs = s.executeQuery(sql);
        while (rs.next()) {
            found.add(new StudentInfo(rs));
        }
        return found;
    } catch (SQLException e) {
        e.printStackTrace();
    } finally {
        // here we should close our statement
    }
    return found;
}
```
We then defined the StudentInfo class. It should contain the values returned in the query (s_id, s_name, gpa, and size_hs), but we want other names for our attributes (such as studentId, studentName, gpa, and hsSize). The class will contain a lot of boilerplate, one way of implementing it would be:
```
class StudentInfo {

    public final int studentId;
    public final String studentName;
    public final double gpa;
    public final int hsSize;

    public StudentInfo  (ResultSet rs) throws SQLException {
        this.studentId = rs.getInt("s_id");
        this.studentName = rs.getString("s_name");
        this.gpa = rs.getDouble("gpa");
        this.hsSize = rs.getInt("size_hs");
    }
}
```
These objects are immutable, and the only interesting thing about them is the code inside the constructor -- there we use the ResultSet methods getString, getInt and getDouble to fetch values from a tuple returned from a query.

To free myself from the tedious task of writing classes such as StudentInfo by hand, I wrote a simple script, jdbc-class, which lets me define them using a simple description (see the bottom of this page for more information about that).

We add code in our main program to call our Database-method (db is the Database reference in our main program):

void showStudents() {
    for (StudentInfo s : db.studentInfo()) {
        System.out.println(String.format("%d: %s (%.2f)",
                                         s.studentId,
                                         s.studentName,
                                         s.gpa));
    }
}

And that's about it! We tried this and got:

123: Amy (3.90)
234: Bob (3.60)
345: Craig (3.50)
456: Doris (3.90)
567: Edward (2.90)
678: Fay (3.80)
789: Gary (3.40)
987: Helen (3.70)
876: Irene (3.90)
765: Jay (2.90)
654: Amy (3.90)
543: Craig (3.40)

The next task was to see which majors at which colleges were most popular. One way of doing it would be to ask SQLite for all applications, and then count them in Java code -- but that way we would miss out on the power of SQL, much, much better is to use a query like:

SELECT     COUNT() as count, c_name, major
FROM       applications
GROUP BY   major, c_name
ORDER BY   count DESC, c_name, major

So we embedded it by first creating a ApplicationCount-class (I ran my jdbc-class-script, and pressed Ctrl-D after the second String below):

$ jdbc-class
ApplicationCount
count : int
major : String
c_name -> college : String

which produced the class:

class ApplicationCount {

    public final int count;
    public final String major;
    public final String college;

    public ApplicationCount  (ResultSet rs) throws SQLException {
        this.count = rs.getInt("count");
        this.major = rs.getString("major");
        this.college = rs.getString("c_name");
    }
}

We then wrote a function which used a PreparedStatement to query our database:

    public List<ApplicationCount> getApplicationCount() {
        List<ApplicationCount> found = new LinkedList<>();
        try {
            String sql =
                "SELECT COUNT() AS count, c_name, major\n" +
                "FROM   applications\n" +
                "GROUP BY c_name, major\n" +
                "ORDER BY count DESC, c_name, major\n";

            PreparedStatement ps = conn.prepareStatement(sql);
            ResultSet rs = ps.executeQuery();
            while (rs.next()) {
                found.add(new ApplicationCount(rs));
            }
            return found;

        } catch (SQLException e) {
            e.printStackTrace();
        } finally {
        }
        return found;
    }

and finally we used this in our main program:

    void showMostCommonApplications() {
        for (ApplicationCount ac : db.getApplicationCount()) {
            System.out.println(String.format("%4d: %s @ %s",
                                             ac.count,
                                             ac.major,
                                             ac.college));
        }
    }

When we ran it, we got the output:

   3: CS @ Stanford
   2: CS @ Berkeley
   2: EE @ Cornell
   2: history @ Stanford
   1: biology @ Berkeley
   1: CS @ Cornell
   1: bioengineering @ Cornell
   1: history @ Cornell
   1: psychology @ Cornell
   1: CS @ MIT
   1: bioengineering @ MIT
   1: biology @ MIT
   1: history @ MIT
   1: marine biology @ MIT
   1: psychology @ MIT
   1: EE @ Stanford

The lecture slides are quite comprehensive, hopefully you can try out some updates and parametrized queries (with PreparedStatement's) using this skeleton and the lecture slides -- you could try to create some new students, and ask for information about a given student.

Generating boilerplate classes

The script (jdbc-class) which I used to generate our simple classes is included in the zip-file -- it's a quick hack I wrote just before the lecture, so you shouldn't really pay any attention to what's in it, but you may find it useful for lab3. You'll need Scala to be installed on your computer to run the script.

Using the script, you first define the name of the class on one line, and on subsequent lines, you tell the names of the SQL-attributes, the names you'd prefer to use in your Java code, and the types of the values (using lowercase for int and double). If you're satisfied with the SQL name, you only need to tell its type. So, to create our StudentInfo class, I typed (with Ctrl-D after the last type):

$ jdbc-class
StudentInfo
s_id -> studentId : int
s_name -> studentName : String
gpa : double
size_hs -> hsSize : int