SQL*Loader: All-Null Lines

Two months ago, a reader wrote in with what appeared to be a simple problem. He was using SQL*Loader to load account transfer data similar in form to the following, and wanted to skip lines with null values:

98532, Andy, Tom, 10/1/03
36325, Tom, Andy, 12/1/03
48924, Tom,,12/1/03
49832,,,
,,,,
,,,,

I don't use it much, but SQL*Loader supports a WHEN clause that lets you restrict a load to records meeting criteria that you specify. This seemed the perfect application for WHEN. After tinkering a bit to ensure I had the syntax correct, I sent back the following solution, and went to bed that night feeling mighty good about having solved the problem so easily:

LOAD DATA
INFILE account_transfer.csv
TRUNCATE INTO TABLE account_transfer
WHEN (account_number <> '') AND (transfer_from <> '')
 AND (transfer_to <> '') AND (transfer_date <> '')
TRAILING NULLCOLS
(
   account_number CHAR TERMINATED BY ',',
   transfer_from CHAR TERMINATED BY ',',
   transfer_to CHAR TERMINATED BY ',',
   transfer_date DATE 'mm/dd/yy'
)

I should've known better. I awoke the next morning, poured my morning coffee, opened my email, and found a clarification from the same reader. I hadn't solved his problem at all. I had only misunderstood him. He wanted to skip lines with NULL values, but only when *all* values were null. If any one of the fields in a record contained a value, any value at all, it was important to attempt to load the record.

Now the problem was much more interesting! WHEN defines records to load, not to exclude. Thus, to exclude records with all NULLs, I had to include records with the opposite condition: I had to include records with at least one non-NULL value. This implied an OR expression such as:

WHEN (account_number <> '') OR (transfer_from <> '')
  OR (transfer_to <> '') OR (transfer_date <> '')

Looks good, doesn't it? Guess what? SQL*Loader doesn't allow the use of OR in the WHEN clause. Such a seemingly simple solution, and I couldn't implement it! What to do?

Now I had the sort of problem that gets my blood racing. The solution wasn't going to be straightforward. It would take a bit of creativity to put some pieces together in a way that worked for what this reader needed to do. I love these sorts of problems.

Recalling my earlier article on SQL*Loader, I wondered whether I could look at the problem a bit differently. Rather than test each field, could I test the entire line? Indeed! That might work. I added the following field as the first of my field definitions:

   entire_line FILLER CHAR TERMINATED BY '|',

For a delimiter, I chose the vertical bar (|), which never appeared in the reader's data. Thus, entire_line took on the value of all characters in each input record.

Next, I rewrote my WHEN clause to test for a series of delimiters in the entire_line field:

   WHEN (entire_line <> ',,,')

Conveniently, the job that created the data file the reader was trying to load always included all the delimiters. If all field values were NULL, the record consisted of a known number of commas, and I could test for those. The WHEN clause above loads records consisting of anything other than that known number of commas. In other words, it loads any record in which any one field is non-NULL.

Lastly, I rewrote the account_number definition to reposition SQL*Loader at the beginning of the input record:

   account_number POSITION(1) CHAR TERMINATED BY ',',

My final control file looked as follows:

LOAD DATA
INFILE account_transfer.csv
TRUNCATE INTO TABLE account_transfer
WHEN (entire_line <> ',,,')
TRAILING NULLCOLS
(
   entire_line FILLER CHAR TERMINATED BY '|',
   account_number POSITION(1) CHAR TERMINATED BY ',',
   transfer_from CHAR TERMINATED BY ',',
   transfer_to CHAR TERMINATED BY ',',
   transfer_date DATE 'mm/dd/yy'
)

SQL*Loader processes fields sequentially, so entire_line was set to the contents of an entire record. Then SQL*Loader would go back to position 1, to the beginning, and work through the comma-delimited fields.

The solution worked. The reader tested it and put it into production. Records with all nulls contained only a string of commas, which the WHEN clause detected, and those records were omitted from the load. Everything else was loaded.

I love problems like this. Solving them is like working with Legos. You have something in mind to build, and you have a whole bunch of different pieces, and somehow you need to assemble those pieces in a way that builds what you want. Sometimes the solution is straightforward, sometimes not. Sometimes you have a piece that is almost a perfect fit, but almost isn't good enough, and you can't change the piece, so you have to back up and redesign your solution to accommodate the Lego pieces you do have.

Sample Data

Sample data and two SQL*Loader control files are part of the article above. Use the following SQL to create the target table:

CREATE TABLE account_transfer (
   account_number VARCHAR2(5),
   transfer_from  VARCHAR2(5),
   transfer_to    VARCHAR2(5), 

   transfer_date  DATE);