Subqueries

Ninth in a series of posts in response to Tim Ford's #EntryLevel Challenge.


Subqueries are queries within a query – one SELECT nested within another. They can take the place of column and table references, helping you to formulate queries that otherwise would be more difficult or less efficient to express using join operations. 

As Columns

Following is an example of a subquery filling a position typically occupied by a column name or expression. The goal of the query is to list the number of products per subcategory.

SELECT ps.name,
    (SELECT COUNT(*)
    FROM Production.Product p
    WHERE p.ProductSubcategoryID = ps.ProductSubcategoryID) AS sub_count
FROM production.ProductSubcategory ps

The output looks as  follows:

name                                                 sub_count
-------------------------------------------------- -----------
Mountain Bikes                                              32
Road Bikes                                                  43
Touring Bikes                                               22
...

You can spice things up by copying the subquery into the WHERE clause and using it there to limit the query results to subcategories having less than three products:

SELECT ps.name,
    (SELECT COUNT(*)
    FROM Production.Product p
    WHERE p.ProductSubcategoryID = ps.ProductSubcategoryID) AS sub_count
FROM production.ProductSubcategory ps
WHERE (SELECT COUNT(*)
    FROM Production.Product p
    WHERE p.ProductSubcategoryID = ps.ProductSubcategoryID) < 3

name                                                 sub_count
-------------------------------------------------- -----------
Brakes                                                       2
Chains                                                       1
Derailleurs                                                  2
...

Duplicating subqueries is a bad idea though. You must change both whenever you change one, and the database optimizer might not detect that only one execution is needed. Factoring can help. 

Factoring

You can factor repeated subqueries using the WITH clause. You may not be able to eliminate repetition entirely, but you can push most of the repeated logic into the WITH clause where it occurs just one time. Here's an example, which is the preceding query reformulated:

WITH sub_count AS (
    SELECT p.ProductSubcategoryID, COUNT(*) scc
    FROM Production.Product p
    GROUP BY p.ProductSubcategoryID
    )
SELECT ps.name,
    (SELECT scc FROM sub_count
     WHERE sub_count.ProductSubcategoryID = ps.ProductSubCategoryID)
FROM production.ProductSubcategory ps
WHERE (SELECT scc FROM sub_count
       WHERE sub_count.ProductSubcategoryID = ps.ProductSubCategoryID) < 3

It takes thought and some mental gymnastics to push subqueries into the WITH clause. I had to rewrite my correlated subqueries into a non-correlated query that with one execution returns all the rows needed to satisfy all the correlations that come up during execution of the main query. I had not gotten enough sleep the night before, and it took me far longer to push through the problem and solve it than it should have. Writing SQL works better on a good night's rest.

Is my new query more efficient? Yes, it is! Figure 1 shows the execution plans of both queries. The second plan is for the version of the query written using a WITH clause to centralize the subquery logic. That query consumes 47% of the batch cost versus 53% for the first query with its duplicate logic. Factoring the subquery logic has improved efficiency by reducing execution cost.

Figure 1. The factored query runs more efficiently as denoted by its lower cost

Figure 1. The factored query runs more efficiently as denoted by its lower cost

As Inline Views

Subqueries can take the place of table and view names. Such subqueries are often termed as inline views. Here's a query that makes use of an inline view to count the number of products per category:

SELECT ProductCategoryID, SUM(sub_count)
FROM (
    SELECT ps.ProductCategoryID, ps.name,
        (SELECT COUNT(*)
        FROM Production.Product p
        WHERE p.ProductSubcategoryID = ps.ProductSubcategoryID) AS sub_count
    FROM production.ProductSubcategory ps
) AS cat_count
GROUP BY ProductCategoryID

ProductCategoryID            
----------------- -----------
                1          97
                2         134
                3          35
                4          29

(4 row(s) affected)

Following are just a few things to note about this query:

  1. The subquery in the FROM clause is a variation on the very first query in this post. 
  2. I've added ps.ProductCategoryID to that subquery, and the main query executes a GROUP BY operation on that column.
  3. I've provide the name sub_count for the column generated by the innermost subquery. This is because columns in an inline view are required to have names.
  4. I've given the name cat_count to the inline view, because inline views themselves must be named.

What about the WITH clause? There's no harm in writing a WITH clause for a single occurrence of a subquery. For example:

WITH cat_count AS (
    SELECT ps.ProductCategoryID, ps.name,
        (SELECT COUNT(*)
        FROM Production.Product p
        WHERE p.ProductSubcategoryID = ps.ProductSubcategoryID) AS sub_count
    FROM production.ProductSubcategory ps
    ) 
SELECT ProductCategoryID, SUM(sub_count)
FROM cat_count
GROUP BY ProductCategoryID

It may seem silly to factor a single occurrence of a subquery, but doing so can help readability as your queries become larger and more complex. Thoughtful naming and factoring can lead to main queries that are more compact and more readible, and thus more easily understood and maintained by downstream developers. 

At the Beginning

When you do write a WITH clause, it must always go at the beginning of the main query. Never place a WITH clause into an inline view. Following is a contrived example to illustrate this point:

WITH cat_count AS (
    SELECT ps.ProductCategoryID, ps.name,
        (SELECT COUNT(*)
        FROM Production.Product p
        WHERE p.ProductSubcategoryID = ps.ProductSubcategoryID) AS sub_count
    FROM production.ProductSubcategory ps
    ) 
SELECT SUM(all_count) AS big_total 
FROM (
    SELECT ProductCategoryID, SUM(sub_count) AS all_count
    FROM cat_count
    GROUP BY ProductCategoryID
) AS total_all_categories

  big_total
-----------
        295

(1 row(s) affected)

The subquery named cat_count is written into the WITH clause. Even though that subquery is invoked as part of an inline view, the WITH clause must remain outside of that inline view. You can't for example, write like this:

SELECT SUM(all_count) AS big_total 
FROM (
    WITH cat_count AS (
    ...
    SELECT ProductCategoryID, SUM(sub_count) AS all_count
    FROM cat_count
    GROUP BY ProductCategoryID
) AS total_all_categories

Always write your WITH clause at the beginning of your main query. That is the only location the syntax supports. If you have a query written using a WITH clause, and you want to use that query as an inline view, then you'll need to think through and do the reshuffling needed to move the WITH clause to the very beginning of the main query. 

Summary

Subqueries can help you work through writing a query in ways that align well with how you are thinking about the business question at hand. Use them for that purpose. They are also useful in shoehorning new logic into long and complex queries that you otherwise don't want to touch. Use them for that purpose too. Avoid duplication when you can, factoring duplicate subqueries into a WITH clause whenever possible. And above all, get a good night's rest!