User Tools

Site Tools


en:web_development:databases:joins

Web Development Lesson 7 - Databases

Joins

Objective

In this activity you'll use data from two tables and you'll learn why separating data into multiple tables is a good idea.

Setup

  • We'll use PHPMyAdmin again to run our queries, so log in to that and open the database for your class.

Null Value

  • Imagine we want to extend our database of Marvel heroes to include the films they appear in.
  • One way to do this would be to add fields (columns) for each film like this.
character_name hero_villain first_appeared power iron_man first_avenger thor avengers
Black Widow Hero 1964 Skill NULL NULL NULL True
Captain America Hero 1941 Biological NULL True NULL True
Hulk Hero 1962 Biological NULL NULL NULL True
Iron Man Hero 1963 Technology True NULL NULL True
Loki Villain 1949 Magic NULL NULL True True
Scarlet Witch Hero 1964 Magic NULL NULL NULL NULL
  • NULL means 'no information' and we'd read this to mean 'false' when we interpret the results.

Multiple Tables

  • Now consider if we added a column for all 27 films released so far in the MCU, not to mention the 4 TV series. Our table would be enormous, and most of it would be empty (NULL).
  • That would take more time to manage, slow down our queries and take more space on our server. None of these are good outcomes, but what can we do about it?
  • We could make each combination of hero and movie its own row of data, like this.
character_name hero_villain first_appeared power movie
Black Widow Hero 1964 Skill Iron Man
Black Widow Hero 1964 Skill Avengers
Captain America Hero 1941 Biological First Avenger
Captain America Hero 1941 Biological Avengers
Hulk Hero 1962 Bilogical Avengers
Iron Man Hero 1963 Technology Iron Man
Iron Man Hero 1963 Technology Avengers
Loki Villain 1949 Magic Avengers
  • This looks much better. It takes less space, and would be faster, but it still has problems.
  • The first is that a lot of data is repeated. See how we have Black Widow's information (hero_villain, first_appeared and power) twice. If we want to update that data, we'd have to update every single row with Black Widow. That will slow things down and could lead to errors.
  • The second is more subtle. Did you notice that we're missing Scarlet Witch? We could include a row for her without movie data, but how would we manage later when we added Age of Ultron? Would we leave a NULL row taking up space in the database for no purpose? We could check for NULL rows and replace those when we add the first movie, but it's not ideal.
  • The answer is to separate the data into two tables. One with information about the hero, and another tracking which films they've been in.
  • You've seen the first already.
  • Let's take a look at the second. Click on the table called 'appearances' in the left pane of PHPMyAdmin.
  • Here we see a table with a list of hero/movie pairs.

Unique Index

  • Remember that every table must have a unique index. But in this list, Iron Man and Loki both appear more than once in 'character_name' and 'The Avengers' appears five times in 'movie', so neither is appropriate for our primary index.
  • Instead, we create a new field that will be unique. By custom, we call it 'id'.
  • Click on the 'Structure' tab to see how this is defined.
  • It's an integer with the 'unsigned' attribute. This means that it can only be positive numbers. In computing, where numbers are stored as a collection of ones and zeros, each called a 'bit', one bit is usually used to indicate whether a number is positive or negative. By making our index unsigned, we are doubling the number of rows we can have.
  • Also notice under 'Extra' that the 'id' field is set to 'AUTO_INCREMENT' which means that when we add rows the new row will have a number one higher than the highest number previously used. This saves us from having to find an unused number ourselves.
  • So how do we combine the data from these tables?

Joins

  • Click on the 'SQL' tab.
  • Overwrite the query with this one. You can keep it all on one line, but spreading it out makes it easier to read.
SELECT *
FROM heroes
	LEFT JOIN appearances ON heroes.character_name=appearances.character_name
  • The results now show all the information that we had in our most recent table.
  • The advantage here is that we only need to update one line in our heroes table for each hero, so there is much less risk of error and we're taking less storage space. It's also quicker to process, but that's probably not noticeable for such a small database.
  • Let's break down the query.
  • SELECT * is the same as before. It says to show all fields. In this case, show all fields from all tables included in the query.
  • FROM heroes is the same as before, though I've left out the backticks. Backticks are not required in most cases.
  • The last line is new. LEFT JOIN means that we're connecting all rows from the second table that match the first table, and we're using the 'character_name' field of both tables to identify the match (heroes.character_name=appearances.character_name).
  • Notice that we identify the fields to use by naming the database first and separating them by a . . This ensures that the database doesn't get confused because the field names are the same in both tables.
  • Also notice that Scarlet Witch appears in the results because she's in the first table, even though she's not listed in the second table. The missing information is listed as NULL. Let's see what happens if we swap the tables around.
  • We could swap the table names in the query FROM appearances LEFT JOIN heroes ON heroes.character_name=appearances.character_name, but we can also simply change the type of join from LEFT to RIGHT.
SELECT *
FROM heroes
	RIGHT JOIN appearances ON heroes.character_name=appearances.character_name
  • When you run this query, Scarlet Witch disappears because the second table (appearances) is our base and she doesn't appear in that table.
  • Let's go back to the original query so we can see Scarlet Witch.
SELECT *
FROM heroes
	LEFT JOIN appearances ON heroes.character_name=appearances.character_name
  • Before we go on, notice that we've got unnecessary information in our results. 'id' is required by the database, but for now it has no value for us. Nor is there any value in repeating the 'character_name' data.
  • Let's grab only the fields we actually need.
SELECT character_name, hero_villain, first_appeared, power, movie
FROM heroes
	LEFT JOIN appearances ON heroes.character_name=appearances.character_name
  • Notice that we've replaced * with the list of specific fields we want to include.
  • Unfortunately, when you run the query you'll see an error saying that the column 'character_name' is ambiguous. Although they're exactly the same in both tables, the database needs to be told which one to choose.
SELECT heroes.character_name, hero_villain, first_appeared, power, movie
FROM heroes
	LEFT JOIN appearances ON heroes.character_name=appearances.character_name
  • This query should give us exactly what we want.

Next: PHP Database Connections

en/web_development/databases/joins.txt · Last modified: 2023/08/16 09:33 (external edit)