T-SQL Fundamentals: Outer Joins
Excerpt by Don Kiely | May 02, 2013
One of the fundamental concepts of relational databases is the normalization of data into tables, or sets, of similar data. Different data elements are grouped into separate tables. Data about employees is in the dbo.Employees table, data about orders is in the dbo.Orders table, and so forth. The process of organizing the various elements of data into tables is called normalization. The key measure of successful normalization is the ability to join tables effectively.
As the SQL standard has evolved, so has the join notation, both inner and outer. The earliest join notation uses the WHERE clause to enforce the joining criteria. This article takes a deeper dive into outer joins.
An outer join includes all of the rows from either or both of the tables in the join, even if there is a row in either table that doesn't have a match in the other table. Three types of outer joins are possible: left, right, and full.
- A left outer join includes all rows of the first table, even if there is no matching row in the second table.
- A right outer join includes all rows of the second table, even if there is no matching row in the first table.
- A full outer join includes both tables so that all rows of both tables are part of the result set even if there isn't a match in the other table.
The "left" and "right" directions simply reference which table you list first and second in the join clause, respectively.
The challenge of using outer joins is to know when they are appropriate, and which one to use. The following query uses an inner join to generate a list of all customers and the date of each customer's first order. We'll use this query to show how the different outer joins work for this relationship between the Customers and Orders table. A second query returns the total number of Customers.
SELECT CompanyName, MIN(Orders.OrderDate) AS FirstOrder
FROM dbo.Customers INNER JOIN dbo.Orders
ON Customers.CustomerID = Orders.CustomerID
GROUP BY CompanyName
ORDER BY CompanyName
SELECT COUNT(*) FROM dbo.Customers;
The query joins the Customers table to the Orders table and groups the results by company name. The MIN aggregate function returns the lowest order date for each customer. The first few rows of the result set shown in the Figure below display each customer and the date of their first order. The query returns 89 rows. You can see in the results for the second query that there are 91 customers in the table.
WARNING! Be careful with the number of rows reported by the query results window when executing multiple queries. When you first run those two SELECT queries, it reports 90 rows. But that is the total of all rows returned by the two queries, including the SELECT COUNT(*) query. To show the row count of 89 in the Figure above, we selected the grid for the first query's results, so it displays the row count for just that one query.
Why is there a discrepancy between the number of customers in the table and the list of first orders? The list, generated using an INNER JOIN, doesn't include every customer. The behavior of the inner join excludes any customers who have not placed any orders, which means that there isn't a matching order record for two customers. If you need to see a list of all customers regardless of whether they've placed an order, as well as the date of the first order for customers who have, you need to use an outer join.