Session 7 – notes

Here are some resources for the Monday lecture in week 4. I spent most of the time going through the slides and working though some examples – you can see the slides here.

Functional dependencies, closures and keys

We first had a look at the concept on redundancy, and some anomalies (see the slides), and then looked at the definition of a functional dependency:

We then discussed closures:

and keys:

We used this to solve the following problem:

and this is what I scribbled (to save time, I left as an exercise the last few closures of three-attribute sets, and also the definition of the last possible four-attribute set – I promised to publish the solutions on this page, and I'll add them at the bottom of this page later this week):

So, we found two candidate keys, CF and ABF, and we discussed the fact that these are both minimal in the sense that nothing can be taken away from them for them to still be keys.

Normal forms, and normalizing to BCNF

Next we talked about normal forms (see the slides), and focused on BCNF, which is the normal form we normally strive for in this course:

The "iff" means "if-and-only-if", so it's a required, but also sufficient condition.

We also saw an algorithm for getting a table which isn't in BCNF into BCNF:

It may look strange at first, but it is really pretty straightforward. The BNCF-violating FD will have a left hand side which isn't a superkey, so it can occur for many rows in the table, and for each of these rows, the columns on the right hand side will repeat.

We solve this problem by moving the right hand side of the offending dependency into its own table (step 3), where it can be looked up using the left hand side of the dependency – so in our new table we have all the attributes of the FD, with the left hand side as a key, and in a new table (step 4), we copy the original table, except the attributes on the right hand side of the FD, which were just put into their own table. Since we have the left hand side of the offending FD in our new copy of the original table, we can look up the values of the removed attributes by joining the table we created in step 3 using the left hand side of the FD.

We then used our newfound knowledge to solve the following problem:

The functional dependencies asserts that a movie can only belong to one category, and we assumed this to be true when we solved the problem, although I had to agree that it's not really true in real life (unless we introduce special categories for genre-bending movies).

Since there are no ways to 'generate' either of title, year, or star, they must all belong to any key, and using them together, we can find all other attributes, so they're a key by themselves (and the only key).

The relation is not in BCNF, since the left hand side of FD2, (title, year), isn't a superkey – so we could have many rows with the same values of (title, year) and then different a star for each such row, and each of these rows would have length and category repeated.

The algorithm for BCNF-decomposition above tells us to break up our initial relation into two relations:

  • one relation with all the attributes from the BCNF-breaking dependency (FD2) – that would give us the table movie_info (it feels like a reasonable name for this relation, since it contains information about movies):

    movie_info(title, year, length, category)
    
  • one relation with the attributes in the left hand side of FD2 (title and year), and the remaining attributes of our inital relation, except for those in the right hand side of FD2 (when we need them, we can always join them in from the other relation) – that would give us a table movie_stars (it keeps track of which actors star in which movie, hence the name):

    movie_stars(title, year, star)
    

This is what I wrote (I've tidied it up a bit, we'll talk about it more on Thursday):

This would give us the following tables in SQL:

CREATE TABLE movie_info (
  title        TEXT,
  year         INT,
  length       INT,    -- in minutes
  category     TEXT,
  PRIMARY KEY  (title, year)
);

CREATE TABLE movie_stars (
  title        TEXT,
  year         INT,
  star         TEXT,
  PRIMARY KEY  (title, year, star),
  FOREIGN KEY  (title, year) REFERENCES movie_info(title, year)
);

To fill these normalized tables with proper values, we could write:

INSERT
INTO     movie_info(title, year, length, category)
  SELECT DISTINCT title, year, length, category
  FROM            movies;

INSERT
INTO     movie_stars(title, year, star)
  SELECT title, year, star
  FROM            movies;

and to get back the contents of the original table, we just do:

SELECT    title, year, length, category, star
FROM      movie_info
          JOIN movie_stars USING (title, year)

This would yield exactly the same rows as we started out with, and that's the point of the algorithm – it allows a lossless reconstruction (we'll talk more about that next time).

Next up was a similar problem:

Solution: We're not in BCNF since the key is (title, year), and fd2 (see below) has a left hand side which is not a superkey – we must therefore split up movies using fd2:

The tables would be:

CREATE TABLE movies (
  title        TEXT,
  year         INT,
  length       INT,    -- in minutes
  category     TEXT,
  studio_name  TEXT,
  PRIMARY KEY  (title, year),
  FOREIGN KEY  (studio_name) REFERENCES studio_info(studio_name)
);

CREATE TABLE studio_info (
  studio_name     TEXT,
  studio_address  TEXT,
  PRIMARY KEY (studio_name)
);

and we can join them together to get the original data as:

SELECT title, year, length, category, studio_name, studio_Address
FROM   movies
       JOIN studio_info USING (studio_name)

We then had an initial look at the college application database we've seen several times already – the attributes are:

  • s_id: a student id (uniuqe for all students)
  • s_name: the name of a student (not necessarily unique)
  • gpa: a grade point average
  • c_name: the name of a college
  • state: the state in which a college is situated
  • enrollment: the number of students attending a college
  • major: the major someone is applying for
  • decision: the outcome of the application ('true' if accepted)

and I scribbled:

I'd suggest you try to use the BCNF-algorithm above yourself before the Thursday lecture (and feel free to ask me questions about it during the QA session on Tuesday).