Getting Creative with SQL*Loader

SQL*Loader is a powerful tool for loading data into an Oracle database. For some reason, it's also a difficult tool to explain. Most Oracle practitioners, that I've met anyway, learn to use SQL*Loader by example. Few delve beyond the basics to realize just how powerful and flexible SQL*Loader can be. Recently, a reader emailed me with a problem, the solution to which shows off a little understood facet of SQL*Loader, and illustrates how just a little creative thinking can take you quite far with this venerable tool. The reader presented me with the following input data:

code1, description, 34
code2, another description, 23
code3, yet one more description, 67
code4, the last description, 89

The reader was attempting to load this data into a table defined as follows:

 Name           Null?    Type
 -------------- -------- ------------
 CODE                    VARCHAR2(7)
 DESCR                   VARCHAR2(50)
 QTY                     NUMBER
 VALUE                   NUMBER

So far this appears to be a very simple scenario. But the reader presented me with one, quite interesting twist to what is otherwise a trivial load: the third value in the input data, which was a number, had to be loaded into both the QTY and the VALUE columns. If the input data were columnar in nature, storing the third input field into two database columns would be an easy requirement to meet. You would simply write a control file that defined four fields, and the last two fields would both cover the same column positions. For example;

LOAD DATA
INFILE 'c:\a\test.csv'
REPLACE INTO TABLE test
(code POSITION(1:7) CHAR,
 descr POSITION(8:50) CHAR,
 qty POSITION(58:59) INTEGER EXTERNAL,
 value POSITION(58:59) INTEGER EXTERNAL)

You can see here that both qty and value are defined over positions 58-59 in the input record, thus both columns would take the same value. With delimited data, the solution to the problem of setting two columns to the same value isn't quite so obvious. However, there is a solution. At least, there is a solution so long as you are running Oracle8i or higher. The solution takes advantage of two little-known facets of SQL*Loader operation: you can use the POSITION clause with delimited data, and you can get SQL*Loader to read through the same input line twice. Following is the control file to load the data shown earlier into the first three columns of the target table. This represents the trivial case, in which each input field is loaded into just one database column:

LOAD DATA
INFILE 'c:\a\test.csv'
REPLACE INTO TABLE test
(code CHAR TERMINATED BY ',',
 descr CHAR TERMINATED BY ',',
 qty INTEGER EXTERNAL TERMINATED BY ',')

So far so good, but our requirement is to load the third field, the numeric field, into two columns: qty and value. How do we do that? Realizing that it's possible to use a POSITION clause with delimited data, we could attempt write such a clause into the definition of a fourth field. The idea being that our POSITION clause would point to the first character of the third field, causing SQL*Loader to parse that field a second time, this time loading it into the value column in our target table. For example:

LOAD DATA
INFILE 'c:\a\test.csv'
REPLACE INTO TABLE test
(code CHAR TERMINATED BY ',',
 descr CHAR TERMINATED BY ',',
 qty INTEGER EXTERNAL TERMINATED BY ',',
 value POSITION(?) INTEGER EXTERNAL TERMINATED BY ',')

The idea of using POSITION to go back and reread an input field is a good one, but there's one hitch: because our data is delimited, the starting position of that third field will vary from one record to the next. My use of a question-mark (?) for the POSITION in this most recent control file acknowledges that fact. There is still hope! With delimited data, there is one position you can always count on, and that's the first position. Part of the solution is to use POSITION(1) to backup all the way to the beginning of the record. But what then? Now we're pointing at the code field, which we do not want to load twice. True enough. After going to POSITION(1), we'll have to skip the first two fields, and we can do that using the FILLER keyword introduced in Oracle8i. We can then load the third field a second time, into the value column. Following is the control file to do the load as the reader requested it to be done:

LOAD DATA
INFILE 'c:\a\test.csv'
REPLACE INTO TABLE test
(code CHAR TERMINATED BY ',',
 descr CHAR TERMINATED BY ',',
 qty INTEGER EXTERNAL TERMINATED BY ',',
 code_skip FILLER POSITION(1) CHAR TERMINATED BY ',',
 descr_skip FILLER CHAR TERMINATED BY ',',
 value INTEGER EXTERNAL TERMINATED BY ',')

This control file causes SQL*Loader to process each input record in the following manner:

  1. The first comma-delimited input field is stored in the target table's code column.

  2. The second comma-delimited field is stored in the descr column.

  3. The third comma-delimited field is stored in the qty column.

  4. SQL*Loader starts over at the beginning of the input record. This is because POSITION(1) is used in the field definition for code_skip. SQL*Loader then reads the first comma-delimited field again, and skips it as directed by the FILLER keyword.

  5. SQL*Loader reads the second comma-delimited field, this time named descr_skip, and again skips it because the FILLER keyword is used in the field's definition.

  6. SQL*Loader reads the third comma-delimited field, and stores the result in the value column.

Thus, SQL*Loader parses each input record twice, and stores the third, comma-delimited field in both the qty column and the value column of the target table. Only one INSERT statement is executed, after all the input fields have been processed. Key to the solution shown in this article is the recognition that POSITION can be used even with delimited data to reset SQL*Loader's parser back to the beginning of an input line. Also key is the use of the FILLER keyword, new in Oracle8i, to skip unwanted fields in the input data. Not every load problem can be solved within SQL*Loader, but many can be, with just a little creative thinking. Work that you can push onto SQL*Loader represents work you don't need to program yourself, temporary tables you don't need to create, and bugs you don't need to fix. The result is a simpler and more reliable load.