SQL Joinery

Fourth in a series of posts in response to Tim Ford's #EntryLevel Challenge.


SQL supports three types of join operation. Most developers learn the inner join first. But there are two other join operations you should know about. These are the outer join, and the full outer join. These additional join types allow you to write in essence could be termed as optional joins

Inner Joins

The so-called inner-join is the default. It's the happy path from a theory perspective, and it's the join type most SQL developers learn first. Use it to combine related rows from two or more tables. 

For example, perhaps you want to report on all the customers in the AdventureWorks database. You might begin working that business problem by writing the following query:

SELECT c.CustomerID, p.FirstName, p.LastName
FROM Sales.Customer AS c JOIN Person.Person AS p
     ON c.PersonID = p.BusinessEntityID
     
 CustomerID FirstName   LastName
----------- ----------- -----------
      29485 Catherine   Abel
      29486 Kim         Abercrombie
      29487 Humberto    Acevedo
      ...

Implicit in this query is the presumption that all customers are persons. The inner join operation will not return customers who lack a corresponding row in the Person table. Yet there are customers in AdventureWorks without a corresponding row in Person. The implicit presumption that all customers are persons is wrong, and we need some other join operation capable of correctly addressing the business requirement by listing all the customers.

Outer Joins

Our solution lies in the outer join, which perhaps ought to have been termed the optional join. Designate a driving table using the keywords LEFT and RIGHT. Then the join to the other table becomes optional. For example, we can report on all customers – whether persons or not – by writing the following left outer join:

SELECT c.CustomerID, p.FirstName, p.LastName
FROM Sales.Customer AS c LEFT JOIN Person.Person AS p
     ON c.PersonID = p.BusinessEntityID
     
 CustomerID FirstName   LastName
----------- ----------- -----------
      29485 Catherine   Abel
      29486 Kim         Abercrombie
      29487 Humberto    Acevedo
      ...
          1 NULL        NULL
          2 NULL        NULL
          3 NULL        NULL

The keywords LEFT JOIN specify the table written to the left as the driving table. That's the Customer table, so we'll get all rows from that table in our result. Whenever possible, those rows are joined to the Person table. Nulls are returned when there are no corresponding Person rows. 

Left and right joins are the same operation. The only difference lies in the order in which you mention the tables in your query. The preceding left join can be rewritten as a right join by mentioning the tables in the opposite order:

SELECT c.CustomerID, p.FirstName, p.LastName
FROM Person.Person AS p RIGHT JOIN Sales.Customer AS c
     ON c.PersonID = p.BusinessEntityID

It's common to spell out LEFT OUTER JOIN and RIGHT OUTER JOIN. The word OUTER is optional in the syntax, but using it helps remind you and other developers maintaining the code that an optional join is in play.  

Both queries in this section give the same result. However, it's customary in practice to write all such joins as left joins. I've yet to encounter right joins in production code. I'm sure they are out there, but it's better in my opinion to stay with left joins and avoid confusing future developers.

Full Outer Joins

It's rare, but it might happen that you want both tables in a join to be optional. Perhaps you want to list all customers, and all persons, joining them where possible, and you want the entire listing in a single report. Enter the full outer join:

SELECT c.CustomerID, p.FirstName, p.LastName
FROM Sales.Customer AS c FULL JOIN Person.Person AS p
     ON c.PersonID = p.BusinessEntityID
     
 CustomerID FirstName   LastName
----------- ----------- -----------
      29485 Catherine   Abel
      29486 Kim         Abercrombie
      29487 Humberto    Acevedo
      ...
          1 NULL        NULL
          2 NULL        NULL
          3 NULL        NULL
      ...
       NULL Syed        Abbas
       NULL Kim         Abercrombie
       NULL Hazem       Abolrous
       ...

You won't get the nice groupings that I show if you execute the full outer join in AdventureWorks. I took some license and grouped the rows for the sake of clarity. You will get back the same rows, but you may see them all mixed up rather than neatly grouped. 

Results from the preceding full outer join include the following rows:

  • All rows from an inner join. These are the first three in the example output, and each row contains column values from both Customer and Person.
  • The additional rows that would be returned by a left outer join having Customer as the driving table. These are the second group in the output, and the Person columns are null. 
  • The additional rows that would be returned by a right outer join having Person as the driving table. These are the final group, and this time the Customer columns are null.

Putting the query into business terms, I would say that it returns all customers and all persons, correlating customer to person whenever possible. 

Implicit Exclusion

Earlier in the section on inner joins, I pointed out how the inner join implicitly excluded customers who were not also listed as persons. In that particular case, the business requirement was to include those customers, which led to my introducing outer joins.

But it's not always wrong to implicitly exclude results via an inner join. Think about the business requirement to list products having reviews along with the names of reviewers and their ratings. Here's a query to generate that list:

SELECT p.ProductID, pr.ReviewerName, pr.Rating
FROM Production.Product AS p INNER JOIN Production.ProductReview AS pr
     ON p.ProductID = pr.ProductID
     
  ProductID ReviewerName Rating
----------- ------------ ------
        709 John Smith        5
        937 David             4
        937 Jill              2
        798 Laura Norman      5

The inner join is a perfectly proper choice. The inner join's implicit exclusion of non-reviewed products aligns with the business requirement to list only those products having reviews.

Inner joins are also reasonable when foreign key constraints are in place to ensure that rows in one table have corresponding rows in another. Such is the case when reporting on the names of sales persons. There is a chain of foreign key constraints that require all sales persons to be employees, and all employees to be persons. 

Joins Are  Fundamental

Joins are fundamental in SQL, and you'll be hard pressed to find many production queries that are written without at least one join operation. Choose inner joins when the business question you are answering requires corresponding rows in both tables, or when database constraints ensure those same corresponding rows. Write left outer joins when you need to make joins optional in one direction. Question any alleged need for a full outer join, but you will sometimes need them.