Thursday, July 30, 2009

munging numbers and strings in Oracle

An item of current interest to me is a compound number. Sometimes I see it in the database as one big number: 12345678901234 and at other times I see it in the database as two shorter numbers: 1234567890 and 1234. So how can I go back and forth? There are a few tricks I've learned to keep up with it:


  1. Always make sure you know which data type the item you want is being stored as. Sometimes this ID is stored as a string, sometimes as a number. Figuring this out depends on the tools you use, but make sure you know what you're working with - you can usually use DESC to find out if you're just sitting at a SQL prompt. Also, I'm assuming in these examples that I can trust my data to be entered correctly (e.g. no one accidentally put a letter in somewhere), which is probably a stretch in real life but will work in my particular case.

  2. When it is two numeric fields and I need one string field, I can format and concatenate the two numbers into a big string:
    CONCAT(TO_CHAR(first_number, 'FM0000000000'),TO_CHAR(second_number, 'FM0000'))
    It took some looking things up to figure out that adding the 'FM' at the beginning of the format string would keep Oracle from leaving a space at the beginning for the +/- sign, which was key to actually matching the string.

  3. When it is one big string field and I need two numbers, I can split them:

    SELECT whatever FROM you_know_what WHERE TO_NUMBER(SUBSTR(big_string_field, 1, 10)) = first_number AND TO_NUMBER(SUBSTR(big_string_field, -4)) = second_number

    Number positions start with 1 in SQL (not 0 like in code), so that first SUBSTR gets the first 10 characters of my string, and the second gets the last 4 (negative position counts backwards from the end of the string). Note that if there are non-numeric characters in the string, TO_NUMBER would fail, everyone dies, oh the embarrassment.

Friday, July 24, 2009

Stupid Oracle SQL Tricks, part 2

grouping dates by week:
to_char(my_date + (1-to_char(my_date,'D')),'YYYY-MM-DD') Week_Of_My_Date
You're basically subtracting the day of the week and ending up with Sunday for any day in a given week.

finding records by today's date or last month or two or three months ago:
select count(my_whatever) from my_table t where t.my_date > add_months(current_date, -2) and t.my_date <= current_date
Oracle uses sysdate for today, which is really common and great and all but I've learned that almost all database systems (really all but SQL Server) support current_date, which is more descriptive and works in more places, FTW!


the median (middle) item of a list:
select PERCENTILE_DISC(0.5) WITHIN GROUP (ORDER BY my_item DESC) median_item from my_table
This is slightly more complicated than just the 'median' (meaning it may not do what you think its doing) so to understand further take a look at the Oracle manual.

learning SQL

1. Get yourself a large database to play with. If you don't have database software, I suppose you'll need to either a) learn how to install it (almost everyone has a free version) or b) ask someone for access to one of their databases (someone where you work might have a development box you can hack away at).

2. Get yourself a good reference book. Everyone I know still looks stuff up. Alternately, you can look stuff up on the web but I like knowing what my database is capable of AND what the others are capable of, which makes this book invaluable.

3. Look over your sample database. Ask yourself questions about the data. Imagine what users will ask you. For instance, how can I find all the stores in Michigan? How can I find all the top-grossing film at all my stores? How can I categorize and label films based on high-sellers, medium-sellers, and low-sellers? Can I categorize and label the films without touching the database design? Who acted in the most movies? It can be (well, arguably) fun to insert some random bad data into your database to make answering these questions harder. What if you have a null to skip over? What if only half the items get filled in properly? How do you workaround this? How can I do these things using a single query rather than two or three?

4. Once you have answered some questions, learn how to deliver those as something people can use. How can I get the data into Excel? On the web? Into a google document? Into a pretty looking, printable thingy? All of those things are important.

Lots of database books start off with how to layout and normalize data (i.e. "database design"), but the times I've needed to do that are somewhat few and far between compared with when I've needed to use data effectively, so I submit that might be a better place to start.