This is an old revision of the document!

Web Development Lesson 7 - Databases

Joins

Objective

In this activity you'll use data from two tables and you'll learn why separating data into multiple tables is a good idea.

Setup

We'll use PHPMyAdmin again to run our queries, so log in to that and open the database for your class.

Null Value

Imagine we want to extend our database of Marvel heroes to include the films they appear in.
One way to do this would be to add fields (columns) for each film like this.

character_name	hero_villain	first_appeared	power	iron_man	first_avenger	thor	avengers
Black Widow	Hero	1964	Skill	NULL	NULL	NULL	True
Captain America	Hero	1941	Biological	NULL	True	NULL	True
Hulk	Hero	1962	Biological	NULL	NULL	NULL	True
Iron Man	Hero	1963	Technology	True	NULL	NULL	True
Loki	Villain	1949	Magic	NULL	NULL	True	True
Scarlet Witch	Hero	1964	Magic	NULL	NULL	NULL	NULL

NULL</hmtl> means 'no information' and we'd read this to mean 'false' when we interpret the results. ==== Multiple Tables ==== * Now consider if we added a column for all 27 films released so far in the MCU, not to mention the 4 TV series. Our table would be enormous, and most of it would be empty (NULL). * That would take more time to manage, slow down our queries and take more space on our server. None of these are good outcomes, but what can we do about it? * We could make each combination of hero and movie its own row of data, like this. ^ character_name ^ hero_villain ^ first_appeared ^ power ^ movie ^ | Black Widow | Hero | 1964 | Skill | Iron Man | | Black Widow | Hero | 1964 | Skill | Avengers | | Captain America | Hero | 1941 | Biological | First Avenger | | Captain America | Hero | 1941 | Biological | Avengers | | Hulk | Hero | 1962 | Bilogical | Avengers | | Iron Man | Hero | 1963 | Technology | Iron Man | | Iron Man | Hero | 1963 | Technology | Avengers | | Loki | Villain | 1949 | Magic | Avengers | * This looks much better. It takes less space, and would be faster, but it still has problems. * The first is that a lot of data is repeated. See how we have Black Widow's information (hero_villain, first_appeared and power) twice. If we want to update that data, we'd have to update every single row with Black Widow. That will slow things down and could lead to errors. * The second is more subtle. Did you notice that we're missing Scarlet Witch? We could include a row for her without movie data, but how would we manage later when we added Age of Ultron? Would we leave a NULL row taking up space in the database for no purpose? We could check for NULL rows and replace those when we add the first movie, but it's not ideal. * The answer is to separate the data into two tables. One with information about the hero, and another tracking which films they've been in. * You've seen the first already. * Let's take a look at the second. Click on the table called 'appearances' in the left pane of PHPMyAdmin. * Here we see a table with a list of hero/movie pairs. * Remember that every table must have a unique index. But in this list, Iron Man and Loki both appear more than once in 'character_name' and 'The Avengers' appears five times in 'movie', so neither is appropriate for our primary index. * Instead, we create a new field that will be unique. By custom, we call it 'id'. * Click on the 'Structure' tab to see how this is defined. * It's an integer with the 'unsigned' attribute. This means that it can only be positive numbers. In computing, where numbers are stored as a collection of ones and zeros, each called a 'bit', one bit is usually used to indicate whether a number is positive or negative. By making our index unsigned, we are doubling the number of rows we can have. * Also notice under 'Extra' that the 'id' field is set to 'AUTO_INCREMENT' which means that when we add rows the new row will have a number one higher than the highest number previously used. This saves us from having to find an unused number ourselves. * So how do we combine the data from these tables? ==== Joins ==== * Click on the 'SQL' tab. * Overwrite the query with this one. You can keep it all on one line, but spreading it out makes it easier to read. <code>SELECT * FROM heroes LEFT JOIN appearances ON heroes.character_name=appearances.character_name</code> * The results now show all the information that we had in our most recent table. * The advantage here is that we only need to update one line in our heroes table for each hero, so there is much less risk of error and we're taking less storage space. It's also quicker to process, but that's probably not noticeable for such a small database. * Let's break down the query. * <html>SELECT * is the same as before. It says to show all fields. In this case, show all fields from all tables included in the query.
FROM heroes is the same as before, though I've left out the backticks. Backticks are not required in most cases.
The last line is new. LEFT JOIN means that we're connecting all rows from the second table that match the first table, and we're using the 'character_name' field of both tables to identify the match (heroes.character_name=appearances.character_name).
Notice that we identify the fields to use by naming the database first and separating them by a . . This ensures that the database doesn't get confused because the field names are the same in both tables.
Also notice that Scarlet Witch appears in the results because she's in the first table, even though she's not listed in the second table. The missing information is listed as NULL. Let's see what happens if we swap the tables around.
We could swap the table names in the query FROM appearances LEFT JOIN heroes ON heroes.character_name=appearances.character_name, but we can also simply change the type of join from LEFT to RIGHT.

SELECT *
FROM heroes
	RIGHT JOIN appearances ON heroes.character_name=appearances.character_name

When you run this query, Scarlet Witch disappears because the second table (appearances) is our base and she doesn't appear in that table.
Let's go back to the original query so we can see Scarlet Witch.

SELECT *
FROM heroes
	LEFT JOIN appearances ON heroes.character_name=appearances.character_name

Before we go on, notice that we've got unnecessary information in our results. 'id' is required by the database, but for now it has no value for us. Nor is there any value in repeating the 'character_name' data.
Let's grab only the fields we actually need.

SELECT character_name, hero_villain, first_appeared, power, movie
FROM heroes
	LEFT JOIN appearances ON heroes.character_name=appearances.character_name

Notice that we've replaced * with the list of specific fields we want to include.
Unfortunately, when you run the query you'll see an error saying that the column 'character_name' is ambiguous. Although they're exactly the same in both tables, the database needs to be told which one to choose.

SELECT heroes.character_name, hero_villain, first_appeared, power, movie
FROM heroes
	LEFT JOIN appearances ON heroes.character_name=appearances.character_name

This query should give us exactly what we want.

Next: PHP Database Connections

Tech School

Table of Contents

Web Development Lesson 7 - Databases

Joins

Objective

Setup

Null Value

Tech School

User Tools

Site Tools

Table of Contents

Web Development Lesson 7 - Databases

Joins

Objective

Setup

Null Value

Page Tools