Partition Outer-Joins

Oracle Database 10g brings a new feature called the "partition outer-join". You use the feature to "densify" data, to fill in missing rows in a result set. My first reaction was to dismiss the feature as trivial, but then I ran into a case where I needed it, and I found that, for being such a simple feature from a syntactic standpoint, it's actually quite useful, and elegant too.

NOTE! You'll understand this article better if you create the example tables and run the queries I talk about on your own systems where you can readily view all the output. Look towards the end of this email, under the heading "Example Data", for a script you can use to create and populate the example tables in your database.

For the September/October 2002 issue of Oracle Magazine, I wrote about a case in which I needed to report out rows that were missing from a table. If you can, read that article before proceeding with this one. (Unfortunately, my article is no longer online at Oracle's website). In my September 2002 article, I had the following two tables:

  CREATE TABLE agr_agreement (
     agr_id NUMBER,
     agr_begin_date DATE,
     agr_end_date DATE,
     fac_id NUMBER );
  
  CREATE TABLE fmr_facility_monthly_report (
     fac_id NUMBER,
     fmr_period DATE,
     fmr_sales NUMBER);

The fmr_facility_monthly_report table was to have one row per facility per month that the facility was open. My job was to discover any missing rows for the year 2000. I began by writing the pivot table query shown in LISTING 1 to return one row for the first of each month:

  LISTING 1: RETURN FIRST-OF-MONTH DATES
  ----------------------------------------
  SELECT ADD_MONTHS(TO_DATE(
     '01-Jan-2000','dd-mon-yyyy'),p.x)
  FROM pivot p
  WHERE (ADD_MONTHS(TO_DATE('01-Jan-2000','dd-mon-yyyy'),p.x)
        BETWEEN TO_DATE('01-Jan-2000','dd-mon-yyyy')
            AND TO_DATE('01-Dec-2000', 'dd-mon-yyyy'));

I then joined, using an inner join, the results of LISTING 1's query to the agr_agreement table. This was not an equi-join, but rather a range-based join based on the facility open and close dates contained within the agreement record. The result was a set of primary keys that should have existed in the fmr_facility_monthly_report table.

  LISTING 2: GENERATE EXPECTED ROWS FOR EACH FACILITY
  ---------------------------------------------------
  SELECT agr.fac_id, ADD_MONTHS(TO_DATE('01-Jan-2000','dd-mon-yyyy'),p.x)
  FROM pivot p,
       agr_agreement agr
  WHERE (ADD_MONTHS(TO_DATE('01-Jan-2000','dd-mon-yyyy'),p.x)
         BETWEEN TO_DATE('01-Jan-2000','dd-mon-yyyy')
             AND TO_DATE('01-Dec-2000','dd-mon-yyyy'))
    AND (ADD_MONTHS(TO_DATE('01-Jan-2000','dd-mon-yyyy'),p.x)
         BETWEEN agr.agr_begin_date
             AND NVL(agr.agr_end_date,ADD_MONTHS(SYSDATE,-1)));

My last step was to extend LISTING 2's query into a union query, using MINUS to subtract primary keys for existing rows, leaving only those primary keys that represented missing rows. LISTING 3 shows my final solution.

  LISTING 3: REPORT THE *MISSING* ROWS
  ------------------------------------
  SELECT agr.fac_id, ADD_MONTHS(TO_DATE('01-Jan-2000','dd-mon-yyyy'),p.x)
  FROM pivot p,
       agr_agreement agr
  WHERE (ADD_MONTHS(TO_DATE('01-Jan-2000','dd-mon-yyyy'),p.x)
         BETWEEN TO_DATE('01-Jan-2000','dd-mon-yyyy')
             AND TO_DATE('01-Dec-2000','dd-mon-yyyy'))
    AND (ADD_MONTHS(TO_DATE('01-Jan-2000','dd-mon-yyyy'),p.x)
         BETWEEN agr.agr_begin_date
             AND NVL(agr.agr_end_date,ADD_MONTHS(SYSDATE,-1)))
  MINUS
  SELECT fmr.fac_id, fmr_period
  FROM fmr_facility_monthly_report fmr
  WHERE fmr_period BETWEEN TO_DATE('01-Jan-2000','dd-mon-yyyy')
                       AND TO_DATE('01-Dec-2000','dd-mon-yyyy');

The solution in LISTING 3 works, and actually works rather well, largely because of the agr_agreement table. Because agr_agreement contained open and close dates for each facility, it was possible to generate the list of candidate monthly report records via an *inner* join to the pivot table.

Let's make this problem more difficult. Let's pretend that the agr_agreement table simply doesn't exist. The fmr_facility_monthly_report table should have one row for each facility, for each month during 2000. We need to write a query that lists any missing monthly reports from that year. Let's start easy. Let's just list the reports that we do have, using the query in LISTING 4.

  LISTING 4: LIST ALL EXISTING MONTHLY REPORTS
  --------------------------------------------
  SELECT fmr.fac_id, fmr.fmr_period
  FROM fmr_facility_monthly_report fmr
  ORDER BY fmr.fac_id, fmr.fmr_period;

The results from LISTING 4 aren't terribly interesting. We know we need to generate rows somehow, else how can we report on the rows that do not exist. The solution to the current problem still involves using a pivot table to generate those rows. Listing 5 shows a subquery that uses a pivot table to generate one row for the first of each month of the year 2000.

  LISTING 5: USE A SUBQUERY TO GENERATE ONE ROW PER MONTH
  -------------------------------------------------------
  SELECT fmr.fac_id, fmr.fmr_period, range.report_date
  FROM fmr_facility_monthly_report fmr
       INNER JOIN (
          SELECT ADD_MONTHS(
                    TO_DATE('01-Jan-2000',
                            'dd-mon-yyyy'),x-1) report_date
          FROM pivot
          WHERE x BETWEEN 1 AND 12) range
       ON fmr.fmr_period = range.report_date
  ORDER BY fmr.fac_id, fmr.fmr_period;

The join in LISTING 5 is an inner join. No new rows are created. The query simply returns the existing rows from fmr_facility_monthly_report, with an additional column from the join.

We can make things more interesting. Our subquery returns one row per month. Let's convert our inner join to an outer join, so that our output data also has one row per month. Listing 6 shows this new version of the query, which is interesting enough that I've also included the resulting output.

  LISTING 6: GENERATE ROWS FOR ALL MONTHS
  ---------------------------------------
  SELECT fmr.fac_id, fmr.fmr_period, range.report_date
  FROM fmr_facility_monthly_report fmr
       RIGHT OUTER JOIN (
          SELECT ADD_MONTHS(
                    TO_DATE('01-Jan-2000',
                            'dd-mon-yyyy'),x-1) report_date
          FROM pivot
          WHERE x BETWEEN 1 AND 12) range
       ON fmr.fmr_period = range.report_date
  ORDER BY fmr.fac_id, fmr.fmr_period;
  
      FAC_ID FMR_PERIO REPORT_DA
  ---------- --------- ---------
         101 01-JAN-00 01-JAN-00
         101 01-FEB-00 01-FEB-00
         101 01-APR-00 01-APR-00
         101 01-MAY-00 01-MAY-00
         101 01-JUN-00 01-JUN-00
         101 01-JUL-00 01-JUL-00
         101 01-AUG-00 01-AUG-00
         101 01-SEP-00 01-SEP-00
         101 01-OCT-00 01-OCT-00
         101 01-NOV-00 01-NOV-00
         202 01-JUL-00 01-JUL-00
         202 01-AUG-00 01-AUG-00
         202 01-SEP-00 01-SEP-00
         202 01-OCT-00 01-OCT-00
         202 01-NOV-00 01-NOV-00
                       01-DEC-00
                       01-MAR-00

Look closely at the output from this query. We do indeed get one row for each month in the year 2000. Sometimes we even get more than one row per month. There are two rows each for July through November. These results are no help at all! Facility 101, for example, is missing reports for March and December, but these query results simply don't tell us that. Likewise, facility 202 is missing reports for January through June, and that information isn't even hinted at in these results.

The fundamental problem with the non-solution in LISTING 6 is that the fmr_facility_monthly_report table has a two-part primary key: fac_id and fmr_period, while the rows generated by our subquery return only one part of that key. In order to find the missing rows for the year 2000, our subquery needs to return every combination of fac_id and month-begin date for that year. I had to think about it a bit, but there is a solution to this problem that works before Oracle Database 10g. That solution, and it's an ugly one, is shown in LISTING 7.

  LISTING 7: GENERATE A ROW FOR EACH FACILITY FOR EACH MONTH
  ----------------------------------------------------------
  SELECT fmr.fac_id, fmr.fmr_period,
         range.report_date, range.fac_id
  FROM fmr_facility_monthly_report fmr
       RIGHT OUTER JOIN (
          SELECT ADD_MONTHS(
                    TO_DATE('01-Jan-2000',
                            'dd-mon-yyyy'),x-1) report_date,
                 fac_id
          FROM pivot
          CROSS JOIN (SELECT DISTINCT fac_id
                      FROM fmr_facility_monthly_report)
                      WHERE x BETWEEN 1 AND 12) range
       ON fmr.fmr_period = range.report_date
       AND fmr.fac_id = range.fac_id
  ORDER BY range.fac_id, range.report_date;
  
      FAC_ID FMR_PERIO REPORT_DA     FAC_ID
  ---------- --------- --------- ----------
         101 01-JAN-00 01-JAN-00        101
         101 01-FEB-00 01-FEB-00        101
                       01-MAR-00        101
         101 01-APR-00 01-APR-00        101
         101 01-MAY-00 01-MAY-00        101
         ...

If you stare at the subquery in LISTING 7 long enough, you'll see that it generates one row for each facility for each month during the year 2000. Those results are then used as the non-optional table in an outer join to fmr_facility_monthly_report. The resulting output makes very clear which monthly reports are missing from the fmr_facility_monthly_report table.

Oh, but the query in LISTING 7 is inscrutable. It took me awhile to come up with it. And it's bothersome that it depends on a sub-sub-query that performs a potentially expensive SELECT DISTINCT operation against the ever-growing detail table fmr_facility_monthly_report.

The new, partition outer-join feature in Oracle Database 10g provides a much simpler solution. Begin with the query from LISTING 6, and add the single line shown in LISTING 8, which is marked by the comment /*** NEW ***/.

  LISTING 8: APPLY PARTITION OUTER-JOIN TO THE PROBLEM
  ----------------------------------------------------
  SELECT fmr.fac_id, fmr.fmr_period, range.report_date
  FROM fmr_facility_monthly_report fmr
       PARTITION BY (fmr.fac_id) /*** NEW ***/
       RIGHT OUTER JOIN (
          SELECT ADD_MONTHS(
                    TO_DATE('01-Jan-2000',
                            'dd-mon-yyyy'),x-1) report_date
          FROM pivot
          WHERE x BETWEEN 1 AND 12) range
       ON fmr.fmr_period = range.report_date
  ORDER BY fmr.fac_id, range.report_date /*** CHANGED ***/;
  
      FAC_ID FMR_PERIO REPORT_DA
  ---------- --------- ---------
         101 01-JAN-00 01-JAN-00
         101 01-FEB-00 01-FEB-00
         101           01-MAR-00
         101 01-APR-00 01-APR-00
         101 01-MAY-00 01-MAY-00
         ...

Preceding the RIGHT OUTER JOIN with the clause PARTITION BY (fmr.fac_id) causes the outer-join to be performed separately for each distinct set of fac_id values. Thus, using my example data for this article, the join is performed once for facility 101, and again for facility 202. As with the LISTING 7 solution, the final results shown in LISTING 8 clearly indicate which monthly reports are missing from the fmr_facility_monthly_report detail table.

I made one change between LISTING 6 and LISTING 8 that I haven't discussed yet. I changed the second ORDER BY column from fmr.fac_id to range.report_date. I did this only to sort the output in the same order as that from LISTING 7, so you could more easily see that both queries return the same data.

At first, I thought partition outer-joins were mere syntactic sugar; that the new syntax did nothing more than provide us with an easier and clearer way to write a query such as I've described in this article. But there's a bit more here than just a bit of elegant syntax. In the example I've just worked through, the new syntax not only provides clarity in the way the query is written, but it also provides a significant boost in performance. LISTING 9 and LISTING 10 show the execution plans for the queries from LISTING 7 and LISTING 8 respectively.

  LISTING 9: EXECUTION PLAN FOR LISTING 7's QUERY
  -----------------------------------------------
     0      SELECT STATEMENT Optimizer=CHOOSE (Cost=10 Card=17 Bytes=697)  
     1    0   SORT (ORDER BY) (Cost=10 Card=17 Bytes=697)
     2    1     HASH JOIN (OUTER) (Cost=9 Card=17 Bytes=697)
     3    2       VIEW (Cost=6 Card=17 Bytes=323)
     4    3         MERGE JOIN (CARTESIAN) (Cost=6 Card=17 Bytes=442)
     5    4           TABLE ACCESS (FULL) OF 'PIVOT' (Cost=3 Card=1 Bytes=13)  
     6    4           BUFFER (SORT) (Cost=3 Card=82 Bytes=1066)
     7    6             VIEW
     8    7               SORT (UNIQUE) (Cost=4 Card=82 Bytes=1066)
     9    8                 TABLE ACCESS (FULL) OF 'FMR_FACILITY_MONTHLY
            _REPORT' (Cost=3 Card=82 Bytes=1066)  
    10    2       TABLE ACCESS (FULL) OF 'FMR_FACILITY_MONTHLY_REPORT' (
            Cost=3 Card=82 Bytes=1804)
  

  LISTING 10: EXECUTION PLAN FOR LISTING 8's QUERY
  ------------------------------------------------
     0      SELECT STATEMENT Optimizer=CHOOSE (Cost=7 Card=1 Bytes=47)
     1    0   SORT (ORDER BY) (Cost=7 Card=1 Bytes=47)
     2    1     VIEW (Cost=6 Card=1 Bytes=47)
     3    2       NESTED LOOPS (GROUP OUTER) (Cost=6 Card=1 Bytes=28)
     4    3         VIEW (Cost=3 Card=1 Bytes=6)
     5    4           TABLE ACCESS (FULL) OF 'PIVOT' (Cost=3 Card=1 Bytes=13)  
     6    3         SORT (ORDER BY) (Cost=4 Card=82 Bytes=1804)
     7    6           TABLE ACCESS (FULL) OF 'FMR_FACILITY_MONTHLY_REPOR
            T' (Cost=3 Card=82 Bytes=1804)

Look at how much simple the partition outer-join's execution plan is. Look too at the optimizer's estimated cost. It's 10 for the LISTING 7 solution, and only 7 for the partition outer-join solution from LISTING 8. Indeed, when executing these two queries on my lab system, I consistently see the LISTING 7 query executing 21 consistent gets, whereas the partition outer-join query requires only 14 consistent gets. That's a 1/3 reduction in I/O from a feature I originally thought to be eye-candy. Not bad Oracle, not bad.

Example Data

CREATE TABLE agr_agreement (
       agr_id NUMBER,
       agr_begin_date DATE,
       agr_end_date DATE,
       fac_id NUMBER );
    
CREATE TABLE fmr_facility_monthly_report (
       fac_id NUMBER,
       fmr_period DATE,
       fmr_sales NUMBER);
    
CREATE TABLE pivot (
       x NUMBER
       );
    
INSERT INTO agr_agreement VALUES (1,'1-Jan-2000',NULL,101);
    INSERT INTO agr_agreement VALUES (2,'1-Jul-2000','31-Dec-2001',202);
    
INSERT INTO fmr_facility_monthly_report VALUES (101,'1-Jan-2000',1000);
    INSERT INTO fmr_facility_monthly_report VALUES (101,'1-Feb-2000',1000);
    INSERT INTO fmr_facility_monthly_report VALUES (101,'1-Apr-2000',1000);
    INSERT INTO fmr_facility_monthly_report VALUES (101,'1-May-2000',1000);
    INSERT INTO fmr_facility_monthly_report VALUES (101,'1-Jun-2000',1000);
    INSERT INTO fmr_facility_monthly_report VALUES (101,'1-Jul-2000',1000);
    INSERT INTO fmr_facility_monthly_report VALUES (101,'1-Aug-2000',1000);
    INSERT INTO fmr_facility_monthly_report VALUES (101,'1-Sep-2000',1000);
    INSERT INTO fmr_facility_monthly_report VALUES (101,'1-Oct-2000',1000);
    INSERT INTO fmr_facility_monthly_report VALUES (101,'1-Nov-2000',1000);
    INSERT INTO fmr_facility_monthly_report VALUES (101,'1-Jan-2001',1000);
    INSERT INTO fmr_facility_monthly_report VALUES (101,'1-Feb-2001',1000);
    INSERT INTO fmr_facility_monthly_report VALUES (101,'1-Mar-2001',1000);
    INSERT INTO fmr_facility_monthly_report VALUES (101,'1-Apr-2001',1000);
    INSERT INTO fmr_facility_monthly_report VALUES (101,'1-May-2001',1000);
    INSERT INTO fmr_facility_monthly_report VALUES (101,'1-Jun-2001',1000);
    INSERT INTO fmr_facility_monthly_report VALUES (101,'1-Jul-2001',1000);
    INSERT INTO fmr_facility_monthly_report VALUES (101,'1-Sep-2001',1000);
    INSERT INTO fmr_facility_monthly_report VALUES (101,'1-Oct-2001',1000);
    INSERT INTO fmr_facility_monthly_report VALUES (101,'1-Nov-2001',1000);
    INSERT INTO fmr_facility_monthly_report VALUES (101,'1-Dec-2001',1000);
    INSERT INTO fmr_facility_monthly_report VALUES (101,'1-Jan-2002',1000);
    INSERT INTO fmr_facility_monthly_report VALUES (101,'1-Feb-2002',1000);
    
INSERT INTO fmr_facility_monthly_report VALUES (202,'1-Jul-2000',1000);
    INSERT INTO fmr_facility_monthly_report VALUES (202,'1-Aug-2000',1000);
    INSERT INTO fmr_facility_monthly_report VALUES (202,'1-Sep-2000',1000);
    INSERT INTO fmr_facility_monthly_report VALUES (202,'1-Oct-2000',1000);
    INSERT INTO fmr_facility_monthly_report VALUES (202,'1-Nov-2000',1000);
    INSERT INTO fmr_facility_monthly_report VALUES (202,'1-Jan-2001',1000);
    INSERT INTO fmr_facility_monthly_report VALUES (202,'1-Feb-2001',1000);
    INSERT INTO fmr_facility_monthly_report VALUES (202,'1-Mar-2001',1000);
    INSERT INTO fmr_facility_monthly_report VALUES (202,'1-Apr-2001',1000);
    INSERT INTO fmr_facility_monthly_report VALUES (202,'1-May-2001',1000);
    INSERT INTO fmr_facility_monthly_report VALUES (202,'1-Jun-2001',1000);
    INSERT INTO fmr_facility_monthly_report VALUES (202,'1-Jul-2001',1000);
    INSERT INTO fmr_facility_monthly_report VALUES (202,'1-Sep-2001',1000);
    INSERT INTO fmr_facility_monthly_report VALUES (202,'1-Oct-2001',1000);
    INSERT INTO fmr_facility_monthly_report VALUES (202,'1-Nov-2001',1000);
    
INSERT INTO PIVOT
    SELECT huns.x+tens.x+ones.x FROM
      (SELECT 0 x FROM dual
       UNION SELECT 1 FROM dual
       UNION SELECT 2 FROM dual
       UNION SELECT 3 FROM dual
       UNION SELECT 4 FROM dual
       UNION SELECT 5 FROM dual
       UNION SELECT 6 FROM dual
       UNION SELECT 7 FROM dual
       UNION SELECT 8 FROM dual
       UNION SELECT 9 FROM dual) ones,
      (SELECT 0 x FROM dual
       UNION SELECT 10 FROM dual
       UNION SELECT 20 FROM dual
       UNION SELECT 30 FROM dual
       UNION SELECT 40 FROM dual
       UNION SELECT 50 FROM dual
       UNION SELECT 60 FROM dual
       UNION SELECT 70 FROM dual
       UNION SELECT 80 FROM dual
       UNION SELECT 90 FROM dual) tens,
      (SELECT 0 x FROM dual
       UNION SELECT 100 FROM dual
       UNION SELECT 200 FROM dual
       UNION SELECT 300 FROM dual
       UNION SELECT 400 FROM dual
       UNION SELECT 500 FROM dual
       UNION SELECT 600 FROM dual
       UNION SELECT 700 FROM dual
       UNION SELECT 800 FROM dual
       UNION SELECT 900 FROM dual) huns;