Pulling Rank for the Boss

New functions introduced in SQL Server 2008 Release 2 let you assign rankings to rows in a result set. These functions are ROW_NUMBER, RANK, and DENSE_RANK. Look to them anytime you are faced with a business question involving words or phrases such as "topmost" or "bottommost", "top N" or "bottom N", or that is otherwise answerable by ranking the rows in a result set according to some criteria that you can apply to one or more columns of data.

Example Data: This is the fourth in a series of articles exploring window functions and their enhanced implementation in SQL Server 2012. If you would like to follow along, execute this script to create the example table and data. All articles in this series use the same data set.

Simple Numbering

You're a data analyst for a large department store chain that is struggling for relevancy and teetering on bankruptcy. Your boss is about to be grilled in an upcoming Board of Directors meeting. While doing his homework to prepare, he phones you with the following request:

"Give me three examples of poor quarterly performance for each department. I need them right away--in thirty minutes."

You can hear the strain in your boss's voice. You would like to ask some follow-up questions, but you wisely decide that your boss is in no mood to discuss the subtleties behind his request. You accept his request as stated and promise to get cracking.

Gulping down the last of your morning coffee, you decide to put the ROW_NUMBER function to use. You'll number quarterly sales records in order from poorest to best performance. Then you can peel off three of the worst examples by department. You begin by taking a quick look at the raw data:

/*------------------------
select department, year, quarter, sales
from quarterly_sales
------------------------*/
department       year quarter    sales
---------------- ------ ------- ----------
Appliances       2000     1235956.23
Appliances       2000     2129871.95
Appliances       2000     3398213.39
Appliances       2000     4   54111.05
Appliances       2001     1   87230.33
...

Now it's time to apply the ROW_NUMBER function. Add the following column expression to your query:

row_number() OVER
   (partition by department
    order by sales) rank

Here's an explanation of what this function is doing for you:

  1. Rows for each department are treated separately. (the partition by department clause)

  2. Each row for a department is given a number, beginning at 1. (the row_number() invocation)

  3. Row number 1 will be the row with lowest sales for a given department. Row numbers will increase as sales increase. (the order by sales clause)

Quickly you type in the new query and view the results:

/*------------------------
select department, year, quarter, sales,
   row_number() OVER
      (partition by department
       order by sales) rank
from quarterly_sales
------------------------*/
department       year quarter    salesrank
---------------- ------ ------- ---------- -----
Appliances       2000     4   54111.05   1
Appliances       2001     1   87230.33   2
Appliances       2002     1   95678.13   3
Appliances       2005     3   96078.73   4
Appliances       2001     4100010.13   5
...
Electronics      2000     1345873.14   1
Electronics      2000     2401144.56   2
Electronics      2000     3454987.33   3
Electronics      2001     1467043.14   4
Electronics      2000     4476539.23   5
...

These results are looking good. The rankings are going upwards as the sales go up. Each department is ranked separately. All that's left is to peel off the three worst rows by department. A simple, enclosing query will do the trick:

/*------------------------
select * from
(
select department, year, quarter, sales,
   row_number() OVER
      (partition by department
       order by sales) rank
from quarterly_sales
) bottom_three
where rank <= 3;
------------------------*/
department       year quarter    salesrank
---------------- ------ ------- ---------- -----
Appliances       2000     4   54111.05   1
Appliances       2001     1   87230.33   2
Appliances       2002     1   95678.13   3
Electronics      2000     1345873.14   1
Electronics      2000     2401144.56   2
Electronics      2000     3454987.33   3
Jewelry          2000     1240939.33   1
Jewelry          2000     3283993.33   2
Jewelry          2008     1287454.44   3
Mattresses       2000     4   32437.42   1
Mattresses       2000     1   34873.33   2
Mattresses       2001     1   36747.44   3
Vacuums          2000     2   56983.43   1
Vacuums          2001     1   64341.54   2
Vacuums          2000     1   65873.32   3

(15 row(s) affected)

Perfect! You rush these results to your boss. Not only have you nailed his request, you've gotten him the results in just 15 minutes. Your boss is so pleased he doesn't seem to mindat all when you take an extra five minutes to explain about the possibility of ties in the data. You fill him in on that corner case and what it can mean, ask him to phone you immediately if there's anything else he needs, and return to your own office feeling just a wee bit more secure about your job.

Respecting Ties

And then the phone rings. Yep! It's the boss. At least he's happy. You're explanation about ties got him to thinking. He doesn't want the board to catch him out on those, so he amends his earlier request and now asks you to:

"Rank the sales by department. Round all the sales numbers to the nearest $10,000. Get me the worst three rankings for each department, ties included. Do it in 20 minutes, and I'll buy you lunch."

Lunch sounds good. Rounding is easy. You promise your boss the results.

The RANK function is your secret weapon. RANK respects ties by assigning them the same rank number. Pull out your previous query and substitute RANK in place of ROW_NUMBER. Invoke ROUND as the boss asks. Done! Here is the new query with its results:

/*------------------------
select * from
(
select department, year, quarter, 
   round(sales,-4) sales, 
   rank() OVER
      (partition by department
       order by round(sales,-4)) rank
from quarterly_sales
) bottom_three
where rank <= 3;
------------------------*/
department       year quarter    salesrank
---------------- ------ ------- ---------- -----
Appliances       2000     4   50000.00   1
Appliances       2001     1   90000.00   2
Appliances       2001     4100000.00   3
Appliances       2002     1100000.00   3
Appliances       2005     3100000.00   3
Electronics      2000     1350000.00   1
Electronics      2000     2400000.00   2
Electronics      2000     3450000.00   3
Jewelry          2000     1240000.00   1
Jewelry          2000     3280000.00   2
Jewelry          2008     1290000.00   3
Jewelry          2008     2290000.00   3
Mattresses       2000     1   30000.00   1
Mattresses       2000     4   30000.00   1
Mattresses       2001     1   40000.00   3
Mattresses       2001     2   40000.00   3
Mattresses       2001     3   40000.00   3
Mattresses       2001     4   40000.00   3
Mattresses       2002     4   40000.00   3
Mattresses       2000     3   40000.00   3
Vacuums          2000     2   60000.00   1
Vacuums          2001     1   60000.00   1
Vacuums          2000     3   70000.00   3
Vacuums          2000     1   70000.00   3

(24 row(s) affected)

That took all of five minutes. Your boss gave you 20. Not wanting to make your work look too easy, you sit back and catch up on Facebook for 10 minutes before taking the elevator to your boss's floor to deliver the results.

Going Dense

As before, you did need to do some explaining about the results. Your boss honed in pretty quickly on the following data from the Mattresses Department:

Mattresses       2000     1   30000.00   1
Mattresses       2000     4   30000.00   1
Mattresses       2001     1   40000.00   3

Why the jump from rank number 1 to rank number 3? What happened to rank number 2? You explain how the two rows ranked as 1 are properly indicated as tying for first place. The third row then, is properly indicated as being third. It is third, because it is preceded by two other rows. Your boss seems happy with that explanation. He remains happy long enough for you to return to your office.

And then the phone rings. Your boss has a request:

"Would you explain again about the missing number 2?"

Grimacing, you realize the gaps in ranking numbers will be a stumbling-block for your boss when he presents the data to the board. Instead of explaining again about the missing rank number, you offer to redo the results. This time, you'll take care to avoid gaps in the numbers. The boss likes that idea—A LOT.

DENSE_RANK is your friend here. You can use DENSE_RANK in the same manner as RANK, but the difference is that DENSE_RANK eliminates gaps in the numbering. That pesky jump from 1 to 3 will be gone, and that's one less detail for your boss to have to explain to a bunch of bored [sic] members who could care less.

You decide to modify the query as follows:

select department, year, quarter, sales, drank from
(
select department, year, quarter, 
   round(sales,-4) sales, 
   rank() OVER
      (partition by department
       order by round(sales,-4)) rank,
   dense_rank() OVER
      (partition by department
       order by round(sales, -4)) drank
from quarterly_sales
) bottom_three
where rank <= 3;

Here's an explanation of the changes:

  1. You retain the call to RANK in the subquery. You do that, because your boss wants the same results as before. He wants the bottom three performing quarters by department, including ties.

  2. For the same reason, you retain the rank <= 3 predicate in the WHERE clause.

  3. You add a new column expression invoking DENSE_RANK. This gives you rankings for the board members that have no gaps.

  4. You rewrite the SELECT list so the query reports the results from DENSE_RANK. Your boss gets the data he wants because of the WHERE clause. But the gaps are gone, giving him one less thing to explain to the board.

Following are the results from this new iteration of the query:

/*------------------------
select department, year, quarter, sales, drank from
(
select department, year, quarter, 
   round(sales,-4) sales, 
   rank() OVER
      (partition by department
       order by round(sales,-4)) rank,
   dense_rank() OVER
      (partition by department
       order by round(sales, -4)) drank
from quarterly_sales
) bottom_three
where rank <= 3;
------------------------*/
department       year quarter    salesdrank
---------------- ------ ------- ---------- ------
Appliances       2000     4   50000.00    1
Appliances       2001     1   90000.00    2
Appliances       2001     4100000.00    3
Appliances       2002     1100000.00    3
Appliances       2005     3100000.00    3
Electronics      2000     1350000.00    1
Electronics      2000     2400000.00    2
Electronics      2000     3450000.00    3
Jewelry          2000     1240000.00    1
Jewelry          2000     3280000.00    2
Jewelry          2008     1290000.00    3
Jewelry          2008     2290000.00    3
Mattresses       2000     1   30000.00    1
Mattresses       2000     4   30000.00    1
Mattresses       2001     1   40000.00    2
Mattresses       2001     2   40000.00    2
Mattresses       2001     3   40000.00    2
Mattresses       2001     4   40000.00    2
Mattresses       2002     4   40000.00    2
Mattresses       2000     3   40000.00    2
Vacuums          2000     2   60000.00    1
Vacuums          2001     1   60000.00    1
Vacuums          2000     3   70000.00    2
Vacuums          2000     1   70000.00    2

(24 row(s) affected)

You have the same 24 department/quarter combinations as before. The selection criteria is unchanged. What is changed is that the gaps in the rankings are gone. Those three rows for the Mattresses Department that bothered your boss last time now look as follows:

Mattresses       2000     1   30000.00    1
Mattresses       2000     4   30000.00    1
Mattresses       2001     1   40000.00    2

The ranking goes smoothly from 1 to 2. No gaps. Nothing to provoke questions from pesky board members during your boss's presentation.

You rush the results upstairs. The boss is ecstatic. He's got all he needs now for the Board of Directors meeting. He decides to call it a morning. You leave for an early lunch with the boss.

Note! The query results in this article just happen to have come out grouped by department and sorted by rank. That apparent grouping and sorting is a side-effect of how I created the table combined with the execution of the window functions. It is a coincidence, nothing more. Do not be lulled into thinking that invoking a window function is a way to control the order in which rows are returned. If ordering is important, you must include an ORDER BY clause.