Thursday, May 27, 2010

Going Meta

Lately I've been using some of the same tricks over and over again in different contexts. Essentially I've just been doing a lot of complex queries, where I select a set of records first, and then select other data out of the first set:

select my_set.X, my_set.Y, my_set.Z
from (
select * from whatever_it_is
where my_item = 'something'
) my_set


Why would I need to do this? Well, for one thing it breaks a problem down into smaller bits, so it is easier to work on. I can write and validate a query that gets all the records I might want to work with, and then write two or three queries that use it to produce friendlier output. I can also use it to do things like counts:

select my_set.X Item,
count(case when my_set.Y = 'Chicago' then 1 else null end) Num_Chicago,
count(case when my_set.Y = 'Phoenix' then 1 else null end) Num_Phoenix,
count(my_set.Y) Total
from (
select * from whatever_it_is
where my_item = 'something'
) my_set
group by my_set.X

This will produce something that looks like this:



ItemNum_ChicagoNum_PhoenixTotal
Item 1 4 1 5
Item 2 2 2 4

You can do grand totals using a UNION ALL with another query:

select my_set.X Item,
count(case when my_set.Y = 'Chicago' then 1 else null end) Num_Chicago,
count(case when my_set.Y = 'Phoenix' then 1 else null end) Num_Phoenix
from (
select * from whatever_it_is
where my_item = 'something'
) my_set
group by my_set.X
UNION ALL
select 'Totals',
count(case when my_set.Y = 'Chicago' then 1 else null end) Num_Chicago,
count(case when my_set.Y = 'Phoenix' then 1 else null end) Num_Phoenix,
count(my_set.Y) Total
from (
select * from whatever_it_is
where my_item = 'something'
) my_set

The key to doing a UNION is you need your columns to match up. So note that I have the same number of columns (3) and I used the same names (Item, Num_Chicago, Num_Phoenix, Total) to describe them. By not including my_set.X in the second query I end up with total counts of all the rows, not simply of one item. The output from this should look like:




ItemNum_ChicagoNum_PhoenixTotal
Item 1 4 1 5
Item 2 2 2 4
Totals 6 3 9

I've used the same sort of complex queries to pull the earliest or latest item by category from a set of records; it is a fairly simple concept but can be very powerful.

SQL*Plus formatting and automation

Here is some advice on how to run your SQL and generate a file with the delimiter of your choice.


  1. Login to sqlplus
  2. Set your page size (how often you get headers – so set this to be arbitrarily large), line size (try to make this long enough to hold a complete row of data), column separator, and suppress the “X rows selected” feedback.
    set pagesize 5000 linesize 500 colsep ',' feedback off
  3. Setup an output file
    spool output.txt
  4. Run your sql command
    select ... from ... where ... ;
  5. Turn off the output
    spool off
    At this point you should have a file called output.txt with your data. Don’t worry if you have some command data in there, just read on.
  6. You might also read up on how to format columns – in particular you can have nicer column headings:
    http://download.oracle.com/docs/cd/B19306_01/server.102/b14357/ch6.htm

Once you have tweaked these settings to your liking, you can automate the process pretty easily. Put your commands into a file. To suppress output on screen you can set termout off (this only works with a command file):

set pagesize 5000 linesize 500 colsep ',' feedback off termout off
spool output.txt
select whatever;
spool off
quit

Then run sqlplus with that file as an argument (the -S will run the commands silently):

sqlplus -S login/pwd@server @commands.txt

An added bonus of doing it this way is you won’t see the sql commands in your output file anymore.

This is particularly handy if you need to do some scripting in an environment where you don't have control over what libraries are installed; I recently used something similar on a unix machine that had Perl installed but not DBD::Oracle. I shelled out and ran my query using sqlplus, and parsed the output in the script. It won't perform as well as a built-in, but for occasional use it is very low rent.