Monday, July 09, 2007

New Built-in Functions [DB2 9 for z/OS]

DB2 9 for z/OS introduces a bevy of new built-in functions (BIFs) for programmers to use in their SQL statements. It is important to keep track of the BIFs available in DB2 because BIFs simplify your coding and development. Invoking a function is always easier than trying to write the equivalent functionality in your host language code. And the BIF will work properly, whereas you cannot always be so sure about your own (sometimes buggy) code.

So, what new function functionality do we get with DB2 9 for z/OS? First of all, we get some new ASCII and EBCDIC conversion functions. The ASCII_CHR function returns the character that has the ASCII code value that is specified by the argument.; and the ASCII_STR function returns a string, in the system ASCII CCSID that is an ASCII version of the string. Knowing that, I bet you can guess what EBCDIC_CHR and EBCDIC_STR do.

We also get UNICODE and UNICODE_STR functions in DB2 9 for z/OS. The UNICODE function returns the Unicode UTF-16 code value of the leftmost character of the argument as an integer. And the UNICODE_STR function? It returns a string in Unicode UTF-8 or UTF-16 (depending on the specified parameter) representing a Unicode encoding of the input string.

Perhaps more interesting is soundex support for testing whether two strings sound the same as each other. DB2 offers two functions here: SOUNDEX and DIFFERENCE. The SOUNDEX function returns a 4 character code that represents the sound of the words in the argument. The result can be used to compare with the sound of other strings. The data type of the result is CHAR(4). So what does this mean? Consider the following example:




SELECT LASTNAME
FROM DSN910.EMP
WHERE SOUNDEX(LASTNAME) = SOUNDEX(’Smith’);



This query would return not only employees with a last name of “Smith,” but also anything that sounds like Smith, such as Smythe.

The DIFFERENCE function is related to SOUNDEX. It returns a value from 0 to 4 where the number represents the difference between the sounds of two strings based on applying the SOUNDEX function to the strings. The higher the value, the closer the two strings are to sounding alike. Consider:


SELECT DIFFERENCE(’CONSTRAINT’,’CONSTANT’),
SOUNDEX(’CONSTRAINT’),
SOUNDEX(’CONSTANT’)
FROM SYSIBM.SYSDUMMY1;

This example returns the values 4, C523, and C523. Since the two strings return the same SOUNDEX value, the difference is 4 (the highest value possible). The more different-sounding the two strings are, the smaller the number would be.

Another interesting series of BIFs are focused on date and time data. These functions include EXTRACT, MONTHS_BETWEEN, and various new timestamp-related functions.

EXTRACT returns a portion of a date or timestamp. You can use EXTRACT to slice up a date/time value into its component pieces. Consider:

SELECT BIRTHDATE,
EXTRACT(DAY FROM BIRTHDATE) AS DAY,
EXTRACT(MONTH FROM BIRTHDATE) AS MONTH,
EXTRACT(YEAR FROM BIRTHDATE) AS YEAR
FROM DSN8910.EMP;

This query would return the entire date, along with each component (year, month, day) of the date as a separate column. You can use one function (EXTRACT) to do the job of the already-existing YEAR, MONTH, and DAY functions. Of course, YEAR, MONTH, and DAY are still available for your use. Similar functionality for EXTRACT exists for time components using HOUR, MINUTE, and SECOND.

The next temporal BIF introduced with DB2 9 for z/OS is the MONTHS_BETWEEN function. It returns an estimate of the number of months between two expressions. If the first expression represents a date that is later than the second, the result will be positive; if the opposite is true the result will be negative. The result is calculated based on a 31 day month.

For example, consider this statement:

SELECT MONTHS_BETWEEN ('2007-02-20','2007-01-17')
AS MONTHS_BETWEEN
FROM SYSIBM.SYSDUMMY1;

The result of this query would be 1.096774193548387.

We also get four new timestamp-related BIFs: TIMESTAMPADD, TIMESTAMPDIFF, TIMESTAMP_FORMAT, and TIMESTAMP_ISO. The first two are rather straightforward: TIMESTAMPADD adds an interval to a timestamp and TIMESTAMPDIFF subtracts two timestamps and returns an interval. I will not get into a discussion of date/time arithmetic here, but if you are interested check out my article Q+As on Dates and DB2.

What about the other two timestamp-related BIFs? TIMESTAMP_FORMAT offers the much-needed ability to choose different display formats for a timestamp value. The valid formats that can be specified are:
• ‘YYYY-MM-DD’
• ‘YYYY-MM-DD-HH24-MI-SS’
• ‘YYYY-MM-DD-HH24-MI-SS-NNNNNN’

And instead of using the dash ( - ) as the separator, you can also use . / , : ; and blank. These separators can be used in any combination. And the VARCHAR_FORMAT function returns a character representation of a timestamp in a format specified as above. So, consider the following query:

SELECT SUBSTR(NAME,1,8) AS TSNAME,
VARCHAR_FORMAT(CREATEDTS,'YYYY-MM-DD-HH24:MI:SS') AS TSCR
FROM SYSIBM.SYSTABLESPACE;
WHERE
CREATEDTS >=
TIMESTAMP_FORMAT('2007-01-01 00:00:00','YYYY-MM-DD HH24:MI:SS');

This query will return the name of all table spaces created since the first of the year, along with its creation timestamp using the format specified.

We also get some new string manipulation functions. The LOCATE_IN_STRING function returns the starting position of the first occurrence of one string within another string. This is basically the same as the existing LOCATE function.

We also get LPAD and RPAD functions. The LPAD function returns a string that is padded on the left, with blanks (or a specific character). The LPAD function treats leading or trailing blanks as significant. RPAD, of course, does the same but on the right. So, consider the following example:

SELECT LPAD(LASTNAME, 30, ’.’ ) AS LAST,
RPAD(FIRSTNME, 30) AS FIRST
FROM DSN910.EMP;

This query will left pad the last name with periods and right pad the first name with blanks.

OVERLAY is yet another new string manipulation function. It allows you to return a string with portions of it overlaid by a specified value. You provide the string, a substring to be overlaid, and its starting point and length, and DB2 does the rest. Learning by example is simpler than trying to explain how it works, so here goes:

SELECT CHAR(OVERLAY('PLATELET','CEMEN',4,4,OCTETS),9),
CHAR(OVERLAY('INSERTING','IS',4,2,OCTETS),10),
FROM SYSIBM.SYSDUMMY1;

The results returned by this query would be:
• 'PLACEMENT ' (starting at position 4 overlay 4 bytes with 5 bytes 'CEMEN')
• 'INSISTING' (starting at position 4 overlay 2 bytes with 'IS')

Additional BIFs include RID, which returns the RID of a row, COLLATION_KEY which returns a varying-length binary string that represents the collation key of the expression in a named collation, DECRYPT_BINARY which adds the ability to decrypt the new BINARY and VARBINARY data types, and NORMALIZE_STRING which takes a Unicode string argument and returns a normalized string. And, of course, we also get scalar functions to support the new data types in DB2 9 (BIGINT, BINARY, VARBINARY and DECFLOAT).

Finally, DB2 9 for z/OS adds three new aggregate functions: CORRELATION, COVARIANCE, and COVARIANCE_SAMP.

The CORRELATION function returns the coefficient of correlation of a set of number pairs. And the COVARIANCE and COVARIANCE_SAMP functions return the (population) covariance of a set of number pairs.

So, as you can see, DB2 9 for z/OS continues adding new and useful functions to simplify our development efforts with DB2.

No comments: