Example from lecture 5

This is a short description of what I was doing during lecture 5. I spent most of the time going through an example which I think could help you when you're about to solve lab 3.

About DBMS's and SQLite

The main topic of the lecture was JDBC, the software layer which makes it possible to connect to a SQL database from Java code (or, for that matter, any language running on the JVM). One of the first slides showed a schematic picture of some different ways of connecting to a database:

There is nothing special about the connection between a web server and the DBMS, if the server runs on the JVM (as, e.g., Jetty), it would in principle be precisely the same kind of connection as from the Java deskop application (and nothing says that what's called 'Client' in the picture couldn't be running on the JVM as well).

(Text on blue plates, such as this, is optional reading, you can skip it with good conscience, but I got some questions during the intermission which made add it):

This course is about databases, not web technology, but I'd like to point out that it's becoming increasingly popular to structure software around microservices, which are small servers listening for requests (typically JSON) using a simple API (often RESTful). Although they often use http(s) for transfer, they are much smaller and more nimble than a traditional web server.

Microservices are typically written in languages such as Java, C#, Ruby or Python, but interesting new languages and platforms such as Elixir/Phoenix, Scala/Akka, and Haskell/Yesod are cropping up. Some of these frameworks use powerful database libraries, such as Slick (Scala) and Ecto (Elixir).

The front-ends to the microservices are usually written in Javascript, or some version of it (CoffeeScript, TypeScript, ...). It has also become popular to write the front-ends in languages which compiles to Javascript, some examples are Elm, GHCJS (Haskell) and Scala.JS.

Using SQLite's command line client

As we have seen before, SQLite (which is what runs when we're using our Jupyter notebooks) differs from traditional DBMS's in that it's not a dedicated server, but a library linked into our program. It still has a command line client, called sqlite3, and using it is like using clients like psql (PostgreSQL) or mysql (MySQL).

I started the SQLite 'client' by issuing the command:

$ sqlite3
SQLite version 3.13.0 2016-05-18 10:57:30
Enter ".help" for usage hints.
Connected to a transient in-memory database.
Use ".open FILENAME" to reopen on a persistent database.
sqlite> 

Here we can do two things:

During lectures 2 and 3, we looked at an example involving college applications (it was taken from Jennifer Widom), the definition of the tables for that example is given in the file Schema.sql:

DROP TABLE IF EXISTS colleges;
CREATE TABLE colleges (
  c_name       TEXT,
  state        TEXT,
  enrollment   INT,
  PRIMARY KEY  (c_name)
);

DROP TABLE IF EXISTS students;
CREATE TABLE students (
  s_id         INT,
  s_name       TEXT,
  gpa          REAL,
  size_hs      INT,
  PRIMARY KEY  (s_id)
);

DROP TABLE IF EXISTS applications;
CREATE TABLE applications (
  s_id         INT,
  c_name       TEXT,
  major        TEXT,
  decision     TEXT,
  PRIMARY KEY  (s_id, c_name, major),
  FOREIGN KEY  (s_id) REFERENCES students(s_id),
  FOREIGN KEY  (c_name) REFERENCES colleges(c_name)
);

And the data we've used is in the file Data.sql:

DELETE FROM students;
DELETE FROM colleges;
DELETE FROM applications;

INSERT
INTO   students(s_id, s_name, gpa, size_hs)
VALUES (123, 'Amy', 3.9, 1000),
       (234, 'Bob', 3.6, 1500),
       (345, 'Craig', 3.5, 500),
       (456, 'Doris', 3.9, 1000),
       (567, 'Edward', 2.9, 2000),
       (678, 'Fay', 3.8, 200),
       (789, 'Gary', 3.4, 800),
       (987, 'Helen', 3.7, 800),
       (876, 'Irene', 3.9, 400),
       (765, 'Jay', 2.9, 1500),
       (654, 'Amy', 3.9, 1000),
       (543, 'Craig', 3.4, 2000);

INSERT
INTO   colleges(c_name, state, enrollment)
VALUES ('Stanford', 'CA', 15000),
       ('Berkeley', 'CA', 36000),
       ('MIT', 'MA', 10000),
       ('Cornell', 'NY', 21000);
        
INSERT
INTO   applications(s_id, c_name, major, decision)
VALUES (123, 'Stanford', 'CS', 'Y'),
       (123, 'Stanford', 'EE', 'N'),
       (123, 'Berkeley', 'CS', 'Y'),
       (123, 'Cornell', 'EE', 'Y'),
       (234, 'Berkeley', 'biology', 'N'),
       (345, 'MIT', 'bioengineering', 'Y'),
       (345, 'Cornell', 'bioengineering', 'N'),
       (345, 'Cornell', 'CS', 'Y'),
       (345, 'Cornell', 'EE', 'N'),
       (678, 'Stanford', 'history', 'Y'),
       (987, 'Stanford', 'CS', 'Y'),
       (987, 'Berkeley', 'CS', 'Y'),
       (876, 'Stanford', 'CS', 'N'),
       (876, 'MIT', 'biology', 'Y'),
       (876, 'MIT', 'marine biology', 'N'),
       (765, 'Stanford', 'history', 'Y'),
       (765, 'Cornell', 'history', 'N'),
       (765, 'Cornell', 'psychology', 'Y'),
       (543, 'MIT', 'CS', 'N'),
       (321, 'MIT', 'history', 'N'),
       (321, 'MIT', 'psychology', 'N');

We can get this into a SQLite database in various ways, for example:

Using the client, we listed all students:

sqlite> SELECT s_id, s_name, gpa
   ...> FROM   students;
123|Amy|3.9
234|Bob|3.6
345|Craig|3.5
456|Doris|3.9
567|Edward|2.9
678|Fay|3.8
789|Gary|3.4
987|Helen|3.7
876|Irene|3.9
765|Jay|2.9
654|Amy|3.9
543|Craig|3.4
sqlite> ...

There are ways to make the output nicer, if you type .help in the client, you can find out how it's done.

JDBC by example

The code I wrote can be downloaded here -- to try the program out, first create the SQLite database (see above), and then compile and run the code (see below).

To compile and run the code, we can use some build system, such as Gradle, or let some IDE, like IntelliJ or Eclipse handle it -- I used emacs and a very primitive Makefile (actually, my use of it was a bit like playing baseball with a Stradivarius, but I wanted to keep things as simple as possible).

The code is in the directory app, and we can compile it with:

javac app/*.java

This is easy to type, but we don't want to type it every time, so I put it as a very simple Makefile target (just type make compile to compile).

The code uses jdbc, and it is imported as described in the lecture notes. Since java.sql is included in the standard Java library, we need no extra incantations to compile our code, but to run it we need to add a .jar-file with SQLite's driver to our CLASSPATH, we can do it with (if we're still in the directory above the source code):

java -cp ".:sqlite-jdbc.jar" app.Main

This is basically what happened when I typed make run.

Connecting to a SQLite database

In lab 3 you will get the skeleton for a database class -- I used exactly the same skeleton during the lecture. The first lines of the class is show below:

import java.sql.*;

public class Database {

    /**
     * The database connection.
     */
    private Connection conn;

    /**
     * Create the database interface object. Connection to the
     * database is performed later.
     */
    public Database() {
        conn = null;
    }

    /**
     * Open a connection to the database, using the specified user
     * name and password.
     */
    public boolean openConnection(String filename) {
        try {
            Class.forName("org.sqlite.JDBC");
            conn = DriverManager.getConnection("jdbc:sqlite:" + filename);
        } catch (SQLException e) {
            e.printStackTrace();
            return false;
        } catch (ClassNotFoundException e) {
            e.printStackTrace();
            return false;
        }
        return true;
    }
    ...
}

It is the openConnection-method which sets up the connection to our database. If we had used a traditional database server (such as PostgreSQL or MariaDB), we would have to provide a url and username/password for our server, when we use SQLite things are much simpler, we just provide the path to the database file.

So, the two critical lines here are

            Class.forName("org.sqlite.JDBC");
            conn = DriverManager.getConnection("jdbc:sqlite:" + filename);

The first of these lines may look absolutely pointless, but makes sure that the JVM has seen the SQLite JDBC driver -- the second line creates a Connection-object linked to our database file. We can now create Statement- or PreparedStatement-objects from our Connection, and from them we can run our queries and updates (using executeQuery or executeUpdate, see below).

Our first query

We first wanted to list all students in our database -- it's easily done with:

SELECT s_id, s_name, gpa, size_hs
FROM   students

but to do it from Java-code, we need to do few things:

The next task was to see which majors at which colleges were most popular. One way of doing it would be to ask SQLite for all applications, and then count them in Java code -- but that way we would miss out on the power of SQL, much, much better is to use a query like:

SELECT     COUNT() as count, c_name, major
FROM       applications
GROUP BY   major, c_name
ORDER BY   count DESC, c_name, major

So we embedded it by first creating a ApplicationCount-class (I ran my jdbc-class-script, and pressed Ctrl-D after the second String below):

$ jdbc-class
ApplicationCount
count : int
major : String
c_name -> college : String

which produced the class:

class ApplicationCount {

    public final int count;
    public final String major;
    public final String college;

    public ApplicationCount  (ResultSet rs) throws SQLException {
        this.count = rs.getInt("count");
        this.major = rs.getString("major");
        this.college = rs.getString("c_name");
    }
}

We then wrote a function which used a PreparedStatement to query our database:

    public List<ApplicationCount> getApplicationCount() {
        List<ApplicationCount> found = new LinkedList<>();
        try {
            String sql =
                "SELECT COUNT() AS count, c_name, major\n" +
                "FROM   applications\n" +
                "GROUP BY c_name, major\n" +
                "ORDER BY count DESC, c_name, major\n";

            PreparedStatement ps = conn.prepareStatement(sql);
            ResultSet rs = ps.executeQuery();
            while (rs.next()) {
                found.add(new ApplicationCount(rs));
            }
            return found;

        } catch (SQLException e) {
            e.printStackTrace();
        } finally {
        }
        return found;
    }

and finally we used this in our main program:

    void showMostCommonApplications() {
        for (ApplicationCount ac : db.getApplicationCount()) {
            System.out.println(String.format("%4d: %s @ %s",
                                             ac.count,
                                             ac.major,
                                             ac.college));
        }
    }

When we ran it, we got the output:

   3: CS @ Stanford
   2: CS @ Berkeley
   2: EE @ Cornell
   2: history @ Stanford
   1: biology @ Berkeley
   1: CS @ Cornell
   1: bioengineering @ Cornell
   1: history @ Cornell
   1: psychology @ Cornell
   1: CS @ MIT
   1: bioengineering @ MIT
   1: biology @ MIT
   1: history @ MIT
   1: marine biology @ MIT
   1: psychology @ MIT
   1: EE @ Stanford

The lecture slides are quite comprehensive, hopefully you can try out some updates and parametrized queries (with PreparedStatement's) using this skeleton and the lecture slides -- you could try to create some new students, and ask for information about a given student.

Generating boilerplate classes

The script (jdbc-class) which I used to generate our simple classes is included in the zip-file -- it's a quick hack I wrote just before the lecture, so you shouldn't really pay any attention to what's in it, but you may find it useful for lab3. You'll need Scala to be installed on your computer to run the script.

Using the script, you first define the name of the class on one line, and on subsequent lines, you tell the names of the SQL-attributes, the names you'd prefer to use in your Java code, and the types of the values (using lowercase for int and double). If you're satisfied with the SQL name, you only need to tell its type. So, to create our StudentInfo class, I typed (with Ctrl-D after the last type):

$ jdbc-class
StudentInfo
s_id -> studentId : int
s_name -> studentName : String
gpa : double
size_hs -> hsSize : int