This is an old revision of the document!
In this activity you'll use data from two tables and you'll learn why separating data into multiple tables is a good idea.
character_name | hero_villain | first_appeared | power | iron_man | first_avenger | thor | avengers |
---|---|---|---|---|---|---|---|
Black Widow | Hero | 1964 | Skill | NULL | NULL | NULL | True |
Captain America | Hero | 1941 | Biological | NULL | True | NULL | True |
Hulk | Hero | 1962 | Biological | NULL | NULL | NULL | True |
Iron Man | Hero | 1963 | Technology | True | NULL | NULL | True |
Loki | Villain | 1949 | Magic | NULL | NULL | True | True |
Scarlet Witch | Hero | 1964 | Magic | NULL | NULL | NULL | NULL |
NULL</hmtl> means 'no information' and we'd read this to mean 'false' when we interpret the results.
==== Multiple Tables ====
* Now consider if we added a column for all 27 films released so far in the MCU, not to mention the 4 TV series. Our table would be enormous, and most of it would be empty (NULL).
* That would take more time to manage, slow down our queries and take more space on our server. None of these are good outcomes, but what can we do about it?
* We could make each combination of hero and movie its own row of data, like this.
^ character_name ^ hero_villain ^ first_appeared ^ power ^ movie ^
| Black Widow | Hero | 1964 | Skill | Iron Man |
| Black Widow | Hero | 1964 | Skill | Avengers |
| Captain America | Hero | 1941 | Biological | First Avenger |
| Captain America | Hero | 1941 | Biological | Avengers |
| Hulk | Hero | 1962 | Bilogical | Avengers |
| Iron Man | Hero | 1963 | Technology | Iron Man |
| Iron Man | Hero | 1963 | Technology | Avengers |
| Loki | Villain | 1949 | Magic | Avengers |
* This looks much better. It takes less space, and would be faster, but it still has problems.
* The first is that a lot of data is repeated. See how we have Black Widow's information (hero_villain, first_appeared and power) twice. If we want to update that data, we'd have to update every single row with Black Widow. That will slow things down and could lead to errors.
* The second is more subtle. Did you notice that we're missing Scarlet Witch? We could include a row for her without movie data, but how would we manage later when we added Age of Ultron? Would we leave a NULL row taking up space in the database for no purpose? We could check for NULL rows and replace those when we add the first movie, but it's not ideal.
* The answer is to separate the data into two tables. One with information about the hero, and another tracking which films they've been in.
* You've seen the first already.
* Let's take a look at the second. Click on the table called 'appearances' in the left pane of PHPMyAdmin.
* Here we see a table with a list of hero/movie pairs.
* Remember that every table must have a unique index. But in this list, Iron Man and Loki both appear more than once in 'character_name' and 'The Avengers' appears five times in 'movie', so neither is appropriate for our primary index.
* Instead, we create a new field that will be unique. By custom, we call it 'id'.
* Click on the 'Structure' tab to see how this is defined.
* It's an integer with the 'unsigned' attribute. This means that it can only be positive numbers. In computing, where numbers are stored as a collection of ones and zeros, each called a 'bit', one bit is usually used to indicate whether a number is positive or negative. By making our index unsigned, we are doubling the number of rows we can have.
* Also notice under 'Extra' that the 'id' field is set to 'AUTO_INCREMENT' which means that when we add rows the new row will have a number one higher than the highest number previously used. This saves us from having to find an unused number ourselves.
* So how do we combine the data from these tables?
==== Joins ====
* Click on the 'SQL' tab.
* Overwrite the query with this one. You can keep it all on one line, but spreading it out makes it easier to read.
<code>SELECT *
FROM heroes
LEFT JOIN appearances ON heroes.character_name=appearances.character_name</code>
* The results now show all the information that we had in our most recent table.
* The advantage here is that we only need to update one line in our heroes table for each hero, so there is much less risk of error and we're taking less storage space. It's also quicker to process, but that's probably not noticeable for such a small database.
* Let's break down the query.
* <html>SELECT *
is the same as before. It says to show all fields. In this case, show all fields from all tables included in the query.FROM heroes
is the same as before, though I've left out the backticks. Backticks are not required in most cases.LEFT JOIN
means that we're connecting all rows from the second table that match the first table, and we're using the 'character_name' field of both tables to identify the match (heroes.character_name=appearances.character_name
). .
. This ensures that the database doesn't get confused because the field names are the same in both tables.NULL
. Let's see what happens if we swap the tables around.FROM appearances LEFT JOIN heroes ON heroes.character_name=appearances.character_name
, but we can also simply change the type of join from LEFT
to RIGHT
.SELECT * FROM heroes RIGHT JOIN appearances ON heroes.character_name=appearances.character_name
SELECT * FROM heroes LEFT JOIN appearances ON heroes.character_name=appearances.character_name
SELECT character_name, hero_villain, first_appeared, power, movie FROM heroes LEFT JOIN appearances ON heroes.character_name=appearances.character_name
*
with the list of specific fields we want to include.SELECT heroes.character_name, hero_villain, first_appeared, power, movie FROM heroes LEFT JOIN appearances ON heroes.character_name=appearances.character_name