Monday, July 28, 2008

Selecting Every Other Row

One of the fun things about publishing is getting questions from readers that make you think. A recent question I received went something like this: "Can I get the odd and even number of rows from a DB2 table?"

Well, my first reaction was to think "this guy doesn't understand the way a SQL DBMS like DB2 works." The data in DB2 tables is not ordered, so there is no way to guarantee that the rows are odd or even numbered. While that observation may (or may not) have been true, it didn't help the guy. So I thought about it and came up with a possible work-around solution.

The first thing we have to do is to mimic row numbers in DB2. Until V9, DB2 did not support the row number construct (such as you can find in Oracle), and we'd like this to work for the versions in support today (V8 and V9).

So, to do this we start by using the COUNT(*) function and a table expression. A table expression is when you substitute SQL in place of the table in the FROM clause of another SQL statement. For example, consider this SQL:

SELECT  DEPTNO, ROWNUM
FROM DSN8810.DEPT A,
TABLE (SELECT COUNT(*) + 1 AS ROWNUM
FROM DSN8810.DEPT B
WHERE B.DEPTNO < A.DEPTNO) AS TEMP_TAB;

That puts a pseudo-row number on the table that we can access in our SQL predicates. If, say, we only want to return the even results, we could write the following query:

SELECT  DEPTNO, ROWNUM
FROM DSN8810.DEPT A,
TABLE (SELECT COUNT(*) + 1 AS ROWNUM
FROM DSN8810.DEPT B
WHERE B.DEPTNO < A.DEPTNO) AS TEMP_TAB
WHERE MOD(ROWNUM,2) = 0
ORDER BY ROWNUM;

The MOD function returns the remainder of dividing the second argument into the first. So, if the remainder is zero, we have an even number. So, this query returns every other row to the result set. If you want the odd rows only, change the predicate with the MOD function to this:

WHERE MOD(ROWNUM,2) <> 0

Of course, there is no guarantee that the same exact rows will be even (or odd) for subsequent executions of this query. It all depends how DB2 optimizes the query for execution. But it does provide a nice way to produce samples of the data (perhaps to populate a test bed of data).

2 comments:

omer brandis said...

good stuff,
funny....


long time db2 user....

Anonymous said...

and of course changing mod to 3 is every third row, or 1000 for multimillion row databases, etc. gets you every nth row. But having to insert the bogus column makes it worthless for getting a sample from an existing large database -- esp. a production one that is not to be changed by the user.