Lecture 4 – notes

Picking up where we left last time…

We first finished the model from Monday, and ended up with:

lect-04-library-er.png

Translating our model into a database

In the slides I had some 'recipes' for translating an ER model into tables in a relational database, we started out by creating the 'obvious' parts of some of the 'obvious' tables (i.e., each entity set is translated into a corresponding table, and each attribut is translated into a column in the table).

During the lecture I only wrote a few tables, to illustrate some points, here I've first written the obvious parts of all entity sets – I've also defined a primary key where we can (there's no good key in the loans table at the moment):

-- This is only the first iteration, more will be added below

CREATE TABLE authors (
  author_name  TEXT,
  nationality  TEXT,
  PRIMARY KEY  (author_name)
);

CREATE TABLE books (
  isbn         TEXT,
  year         INT,
  title        TEXT,
  PRIMARY KEY  (isbn)
);

CREATE TABLE copies (
  copy_ean     TEXT,
  shelf        TEXT,
  condition    INT,
  PRIMARY KEY  (copy_ean)
);

CREATE TABLE libraries (
  library_name  TEXT,
  muncipality   TEXT,
  PRIMARY KEY   (library_name)
);

CREATE TABLE librarian (
  librarian_id    TEXT,
  librarian_name  TEXT,
  PRIMARY KEY     (librarian_id)
);

CREATE TABLE loans (
  out_date      DATE,
  last_in_date  DATE,
  return_date   DATE
);

CREATE TABLE borrower (
  card_id      TEXT,
  ssn          TEXT,
  name         TEXT,
  email        TEXT,
  hashed_pwd   TEXT,
  PRIMARY KEY  (card_id)
);

We then stopped and looked at how to translate different kinds of associations into SQL, and we started by trying to figure out how to implement the *-1 from Copy to Book. You came up with the idea of letting the copies table have an attribute which keeps track of the proper row in the books table – in the slides this kind of attribute is called a foreign key:

CREATE TABLE copies (
  copy_ean     TEXT,
  shelf        TEXT,
  condition    INT,
  isbn         TEXT,
  PRIMARY KEY  (copy_ean),
  FOREIGN KEY  (isbn) REFERENCES books(isbn)
);

Now we can use a JOIN if we want to get information about the book a copy refers to – let's say we are given an ean code, and want the corresponding title and year:

SELECT   title, year
FROM     copies
         JOIN books USING (isbn)
WHERE    copy_ean = '...';

In the same vein, we could add foreign keys to the tables copies (we also need to link them to libraries), loans, and borrowers:

-- Now with added foreign keys, but we still need more...

CREATE TABLE copies (
  copy_ean     TEXT,
  shelf        TEXT,
  condition    INT,
  isbn         TEXT,
  PRIMARY KEY  (copy_ean),
  FOREIGN KEY  (isbn) REFERENCES books(isbn)
);

CREATE TABLE librarians (
  librarian_id    TEXT,
  librarian_name  TEXT,
  library_name    TEXT,
  PRIMARY KEY     (librarian_id),
  FOREIGN KEY     (library_name) REFERENCES libraries(library_name)
);

CREATE TABLE loans (
  copy_ean      TEXT,
  card_id       TEXT,
  out_date      DATE,
  last_in_date  DATE,
  return_date   DATE,
  PRIMARY KEY   (copy_ean, card_id, out_date),
  FOREIGN KEY   (copy_ean) REFERENCES copies(copy_ean),
  FOREIGN KEY   (card_id) REFERENCES borrowers(card_id)
);

CREATE TABLE borrowers (
  card_id       TEXT,
  ssn           TEXT,
  name          TEXT,
  email         TEXT,
  hashed_pwd    TEXT,
  library_name  TEXT,
  PRIMARY KEY   (card_id),
  FOREIGN KEY   (library_name) REFERENCES libraries(library_name)
);

Since a borrower could potentially have library cards at many libraries, the primary key in the borrowers table can't be just ssn. We can use the tuple (ssn, library_name), but easier is to just use the card_id (which is what I did above).

In the loans table, our foreign keys to copies and borrowers help us getting a key (we had none above), but since a borrower can borrow the same copy many times, we also need the date of the loan to make it unique (assuming no one in their right mind borrows the same copy several times in a day… – in that case we could instead use an invented key).

To simplify things here, I haven't implemented the foreign keys to Librarian in the ER-diagram above (we could have one out_librarian_id for the librarian handling the borrowing of the copy, and one in_librarian_id for the librarian handling the returning of the book).


Exercise 1: Write a query which shows the title and year of all books the borrower with the ssn '20000101-1234' has borrowed at 'Lunds stadsbibliotek' (there is an answer at the end of this page, but you should definitely try to write the query yourself before looking at the answer!).


We then talked about how to implement a *-* association, and I hope it was clear to you why we couldn't just let each table have a foreign key to the other (we can only do that if we're referencing just one element in the other table, here we're potentially referencing many).

So we came up with join tables (see the slides), and we connected the authors table and books table with:

CREATE TABLE authorships (
  author_name  TEXT,
  isbn         TEXT,
  PRIMARY KEY  (author_name, isbn),
  FOREIGN KEY  (author_name) REFERENCES authors(name),
  FOREIGN KEY  (isbn) REFERENCES books(isbn)
);

We also talked a bit about how to implement guardians for children – one way is to have the guardian's card-id as a foreign key:

CREATE TABLE borrowers (
  card_id           TEXT,
  ssn               TEXT,
  name              TEXT,
  email             TEXT,
  hashed_pwd        TEXT,
  library_name      TEXT,
  guardian_card_id  TEXT,
  PRIMARY KEY       (card_id),
  FOREIGN KEY       (library_name) REFERENCES libraries(library_name),
  FOREIGN KEY       (guardian_card_id) REFERENCES borrowers(card_id)
);

This works fine, and would mean that everyone who doesn't have a guardian will have their guardian_card_id as NULL (a foreign key is allowed to be NULL, it's only when it isn't NULL that the database checks that it refers to a valid row in the foreign table).

To get the name of a guardian with this implementation is a bit tricky – we need to join the table borrowers with itself. This is called a self join, and it will introduce potential naming conflicts (see the notebook for lecture one, specifically the section about correlated subqueries). To list the names of all children and their guardians, we can use the following query:

SELECT    borrowers.name, guardians.name
FROM      borrowers
          JOIN borrowers AS guardians
              ON borrowers.guardian_card_id = guardians.card_id
ORDER BY  borrowers.name;

We want to join each row in the borrowers table where guardian_card_id isn't NULL with the corresponding row where card_id is the guardian_card_id of the original row. To do this, we need to 'rename' the table of the guardian borrowers, in the code above I call it 'guardians', and then the query works. But since the guardians table (which is the same as the borrowers table) have the column name, I must be specific about which name I'm referring to.

We can introduce the alias children for the 'outer' table (the table in which we look for rows with references to guardians):

SELECT    children.name, children.name
FROM      borrowers AS children
          JOIN borrowers AS guardians ON children.guardian_card_id = guardians.card_id
ORDER BY  children.name;

but I'm not sure this makes the query easier to understand (but YMMV).

An alternative would be to introduce a separate table with guardians (it would make sense if we have only few children in our database, and don't want all the NULL guardians hanging around):

CREATE TABLE guardians (
  card_id           TEXT,
  guardian_card_id  TEXT,
  PRIMARY KEY       (card_id),
  FOREIGN KEY       (card_id) REFERENCES borrowers(card_id),
  FOREIGN KEY       (guardian_card_id) REFERENCES borrowers(card_id)
);

The 'rules' above will take us quite far when we translate an ER model into a relational database:

  • an entity set will become a table
  • a *-1 ('many-to-one') association will become a foreign key on the * side of the association
  • a *-* ('many-to-many) association will become a new (join) table

For those *-* associations with an association class, the attributes of the association class will become attributes in the join table.


Exercise 2: Write a query to find the names and nationalities of the three most borrowed authors (answer below).


We then looked at some kinds of association which requires careful thought (see the slides), and three methods to implement inheritance (see the slides).

We concluded by looking at how to insert, update and delete data in our database – see the links above, and the slides.

If you have any questions about what we did, please go to Moodle and sign up for the QA session next week.

Solutions to exercises

Exercise 1

We need to join several tables: we start at the loans, then we need to:

  • add information about the copy (to get the isbn code), so we JOIN copies, using the ean_code, which is in both loans and copies
  • add information about the book (to get its title and year), so we JOIN books, using the isbn code, which is in copies and books
  • add information about the borrowers (to get the ssn and library), so we JOIN borrowers using the card-id:
SELECT   title, year
FROM     loans
         JOIN copies USING (copy_ean)
         JOIN books USING (isbn)
         JOIN borrowers USING (card_id)
WHERE    ssn = '20000101-1234' AND
         library_name = 'Lunds stadsbibliotek';

Exercise 2

Now we need to join even more tables, and then to group on the author names (which, amazingly!, were unique):

SELECT    author_name, nationality, count()
FROM      loans
          JOIN copies USING (copy_ean)
          JOIN books USING (isbn)
          JOIN authorships USING (isbn)
          JOIN authors USING (author_name)
GROUP BY  author_name
ORDER BY  count() DESC
LIMIT     3;