Using SQL JOINs Statements
March 21, 2020
Sometimes you won’t find all the information you need from a single “normalized” table.
Normalized (or normalization) is referred to a database design process in which tables are restructured and reorganized over several levels called normal forms (NF). Typically, online transaction processing (OLTP) databases are normalized to the third normal form (3NF). A table normalized to the 3NF usually does not have data redundancies and the table represents only a single entity.
A tedious process is to pull data from one table, and store them in a flat file like an Excel sheet. You’d continue pulling more related data from another table and add them to the spreadsheet. Repeat the process until you get all the data you need. Imagine you’re dealing with not just two or three tables but a hundred tables and pulling thousands or millions of data. That would be a DBA’s nightmare!
Fortunately, there’s an easier and more efficient way. RDBMS allows you to combine tables into a virtual table through a process call
JOIN but using four kinds of
JOIN statements or clauses:
JOIN) – joins two tables based on a common field between them and returns all rows where there is a match between them.
LEFT JOIN– Returns all rows from left table, and only matched rows from right.
RIGHT JOIN– Returns all rows from right table, and only matched rows from left.
FULL OUTER JOIN– Returns all rows where there is a match in one table.
CROSS JOIN– Joins every record on one side of the
JOINwith every record on the other side of the
JOINresulting in a Cartesian product (ie. n X m = nm unique records).
LEFT JOIN and
RIGHT JOIN are used together they yield the result as an
OUTER JOIN, so it’s redundant. It’s better to just use
OUTER JOIN to accomplish both. Thus, they’re listed under
OUTER JOIN as alternatives.
Let’s see how
JOIN works. Take the following two tables for this example.
|1||My Fair Lady||1964|
This is a small table, but it’s what a one-to-one relationship would look like. A one-to-one (or 1:1) means there are no duplicates in either table, and one record in one table matches exactly to one record in the other. If you recall your algebra or calculus, it’s like a one-to-one function.
Notice also the “FilmID” column in both tables. They both contain the same film ID of “1”. One of them is called a
PRIMARY KEY (or PK) and the other is a
FOREIGN KEY (or FK).
Say, you want to pull information based on a film, and you want My Fair Lady.
You could do this:
SELECT * FROM Films WHERE FilmID = '1'
This will give you information the film’s ID, Title, and released date.
SELECT * FROM Actors WHERE FilmID = '1'
You get a list of all the actors who were in the film with a matching ID of “1”. In this case it would include only Rex Harrison and Audrey Hepburn.
The problem is they’re pulled separately and you’ll have to manually cut and paste them to a table. That’s laborious work! What if you can achieve it by pulling data from both tables at the same time? That’s exactly what
JOINs do. By using a
JOIN clause, you can combine only the necessary data from both tables into a third table, a temporary table (virtual table) that you can now use as shown below:
|1||My Fair Lady||1964|
|ActorID (PK)||FilmID (FK)||FirstName||LastName|
TABLE: TEMPORARY JOINED TABLE
|1||My Fair Lady||1964||Rex||Harrison|
|1||My Fair Lady||1964||Audrey||Hepburn|
Since this table is not unique but has overlapping data, it is no longer a one-to-one but a one-to-many (1:M). There’s one film which maps to many actors, thus one-to-many. Nonetheless, this would be something more useful and more efficient!