Friday, November 14, 2008

There is Still Time to Attend IDUG Regional Forums

Just a quick entry today to remind everyone that IDUG has started conducting regional events. The first one was held last week in San Ramon, California, and next week two events will be held: in Camp Hill, Pennsylvania, and Kansas City, Missouri, (actually Lenexa, KS). The forums offer you an opportunity to obtain DB2 education rather quickly and inexpensively.

Each Forum offers 2 days of education with 2 tracks: one covering DB2 for z/OS and another covering DB2 for LUW. IDUG is offering full two day registrations for $425 and single day registrations for only $225.

Here are the scheduled dates:

* Camp Hill, PA - November 17 and 18
* Kansas City, MO - November 19 and 20

Check out the links above for the full list of sessions in your area.

I'll be delivering my presentation titled "DB2 9: For Developer's Only" at both of these forums. And there will be many other great speakers there, too!

Friday, November 07, 2008

More on DB2 Date and Time Data: Arithmetic Expressions

DB2 allows you to add and subtract DATE, TIME, and TIMESTAMP columns. In addition, you can add date and time durations to, or subtract them from, date and time columns. But use date and time arithmetic with care. If you do not understand the capabilities and features of date and time arithmetic, you will likely encounter some problems implementing it.

Keep the following rules in mind.

When you issue date arithmetic statements using durations, do not try to establish a common conversion factor between durations of different types. For example, the following two date arithmetic statements are not equivalent:

    1997/04/03 - 1 MONTH
  1997/04/03 - 30 DAYS

April has 30 days, so the normal response would be to subtract 30 days to subtract one month. The result of the first statement is 1997/03/03, but the result of the second statement is 1997/03/04. In general, use like durations (for example, use months or use days, but not both) when you issue date arithmetic.

Another consideration: if one operand is a date, the other operand must be a date or a date duration. If one operand is a time, the other operand must be a time or a time duration. You cannot mix durations and data types with date and time arithmetic.

If one operand is a timestamp, the other operand can be a time, a date, a time duration, or a date duration. The second operand cannot be a timestamp. You can mix date and time durations with timestamp data types.

Now, what exactly is in that field returned as the result of a date or time calculation? Simply stated, it is a duration. There are three types of durations: date durations, time durations, and labeled durations.

Date durations are expressed as a DECIMAL(8,0) number. The result of subtracting one DATE value from another is a date duration. To be properly interpreted, the number must have the format yyyymmdd, where yyyy represents the number of years, mm the number of months, and dd the number of days.

Time durations are expressed as a DECIMAL(6,0) number. To be properly interpreted, the number must have the format hhmmss, where hh represents the number of hours, mm the number of minutes, and ss the number of seconds. The result of subtracting one TIME value from another is a time duration.

Labeled durations represent a specific unit of time as expressed by a number followed by one of the seven duration keywords: YEARS, MONTHS, DAYS, HOURS, MINUTES, SECONDS, or MICROSECONDS. A labeled duration can only be used as an operand of an arithmetic operator, and the other operand must have a data type of DATE, TIME, or TIMESTAMP. For example:

CURRENT DATE + 3 YEARS + 6 MONTHS 

This will add three and a half years to the current date.

Thursday, November 06, 2008

Data Driven: A Great New Book on Data Quality Issues

If you are at all involved in assuring the quality of your company’s data you need to know the work of Thomas C. Redman. Dr. Redman has been working on improving data quality for years and he has written numerous articles and books on the subject. His latest book, Data Driven: Profiting From Your Most Important Business Asset is another winner.

Redman offers the basic thesis of the book right there on page one, where he states “…bad data lie at the root of issues of international importance, including the current subprime mortgage meltdown, lost and stolen identities, hospital errors and contested elections.” After laying down the problem, the rest of the book tells us what we need to do to correct the problems.

Data Driven will help you to improve the methods you deploy for the care and feeding of your data and information; in other words, helping you to control and manage data using similar processes and controls that you deploy on your other assets (finances, people, structures, etc.) – a noble goal, indeed!

The writing is concise and snappy – you won’t get bored reading this book. The style is engaging and it is easy to read. For example, instead of just saying what to do and how to do it, which can be boring, Redman discusses many of the arguments people use to say that data quality is impossible, and then debunks them showing that data quality is possible, if approached properly and thoroughly.

There are many good ideas, charts, and graphs in Data Driven, too. One of my favorites is on page 54, where you can find a chart of the ten habits followed by those with the best data. If you buy this book, make a poster-sized photocopy of that page and hang it up on the wall of the break room and in the data folks’ cubicles. Maybe the habits will rub off on everyone as they gaze upon them everywhere.

But the best little gem in this wonderful book is the entirety of the last chapter, which is titled “The Next One Hundred Days.” In this chapter Dr. Redman offers what he calls a hundred-day panorama. It is not a grand plan because most will not have the depth of understanding required to create such a plan and have it succeed. Instead, the panorama strives for breadth, not depth, with a focus on quality. Diligent readers can follow the guidance in this chapter and thereby begin the long-term process of appreciating the importance of data quality on their business practices.

And that alone is worth the price of the book… but, of course, Data Driven offers much more and I recommend it to every IT and business professional whose job relies on accurate data.

Monday, November 03, 2008

On Date Formats, Part 2

Here is a follow-up question and answer based on my previous blog post:


Q: My format does not fit into any of the formats listed in the DB2 manuals. What if I have a DATE stored like YYYYMMDD (with no dashes or slashes) and I want to compare it to a DB2 date?


A: Okay, let's look at one potential solution to your problem (and then I want to briefly talk about the use of proper data types). First of all you indicate that your date column contains dates in the following format: yyyymmdd with no dashes or slashes. You do not indicate whether this field is a numeric or character field - I will assume that it is character. If it is not, you can use the CHAR function to convert it to a character string.


Then, you can use the SUBSTR function to break the character column apart into the separate components, for example SUBSTR(column,1,4) returns the year component, SUBSTR(column,5,2) returns the month, and SUBSTR(column,7,2) returns the day.


Then you can concatenate all of these together into a format that DB2 recognizes, for example, the USA format which is mm/DD/yyyy. This can be done as follows:


SUBSTR(column,5,2) || "/" || SUBSTR(column,7,2) || 
"/" || SUBSTR(column,1,4)


Then you can use the DATE function to convert this character string into a DATE that DB2 will recognize. This is done as follows:


DATE(SUBSTR(column,5,2) || "/" || SUBSTR(column,7,2) || 
"/" || SUBSTR(column,1,4))


The result of this can be used in date arithmetic with other dates or date durations. Of course, it may not perform extremely well, but it should return the results you desire.


Now, a quick word about using proper data types. I say this all of the time, but there are many applications and implementations "out there" that do not heed the advice: it is wise to use the DATE data type when you store dates in DB2 tables. It simplifies life later on when you want to do things like formatting dates and performing date arithmetic.


Using the appropriate data type also ensures that DB2 will perform the proper integrity checks on the columns when data is entered, instead of requiring application logic to ensure that valid dates are entered.