Tuesday, October 06, 2015

Sorting by Day of the Week in DB2

Today's blog will walk you through a sorting "trick" that you can use when you are dealing with days of the week. A relatively common problem involves sorting a column that contains a day-of-the-week not in alphabetical order by in the order it appears in the week...

Assume that you have a table containing transactions, or some other type of interesting facts. The table has a CHAR(3) column containing the name of the day on which the transaction happened; let’s call this column DAY_NAME. Now, let’s further assume that we want to write queries against this table that orders the results by DAY_NAME. We’d want Sunday first, followed by Monday, Tuesday, Wednesday, and so on. How can this be done?

Well, if we write the first query that comes to mind, the results will obviously be sorted improperly:

SELECT   DAY_NAME, COL1, COL2 . . .
FROM     TXN_TABLE
ORDER BY DAY_NAME;

The results from this query would be ordered alphabetically; in other words
  1. FRI
  2. MON
  3. SAT
  4. SUN
  5. THU
  6. TUE
  7. WED

And it would rare to want this type of data sorted in that order, right? The more common need would be to sort the data the way it appears on the calendar.



One solution would be to design the table with an additional numeric or alphabetic column that would allow us to sort that data properly. By this I mean that we could add a DAY_NUM column that would be 1 for Sunday, 2 for Monday, and so on. Then we could SELECT DAY_NAME but sort by DAY_NUM.

But this is a bad fix. Not only does it requires a database design change, this "fix" also introduces the possibility for the DAY_NUM value and DAY_NAME value to get out of sync... unless we are very careful, or perhaps do not allow DAY_NUM to be changed other than via an INSERT trigger that automatically populates the correct number. But requiring a trigger adds even more complexity to this "fix" which really should indicate to us that it is not a very good proposal.

A better solution uses just SQL and requires no change to the database structures. All you need is an understanding of SQL and SQL functions – in this case, the LOCATE function.  Here is the SQL:

SELECT   DAY_NAME, COL1, COL2 . . .
FROM     TXN_TABLE
ORDER BY LOCATE('SUNMONTUEWEDTHUFRISAT',DAY_NAME);

The trick here is to understand how the LOCATE function works: it returns the starting position of the first occurrence of one string within another string. So, in our example, LOCATE finds the position of the DAY_NAME value within the string 'SUNMONTUEWEDTHUFRISAT', and returns the integer value of that position.

So,if DAY_NAME is WED, the LOCATE function in the above SQL statement returns 10... and so on. To summarize, Sunday would return 1, Monday 4, Tuesday 7, Wednesday 10, Thursday 13, Friday 16, and Saturday 19. This means that our results would be in the order we require.

(Note: Some other database management systems have a function similar to LOCATE called INSTR.) 

Of course, you can go one step further if you’d like. Some queries may need to actually return the day of week. You can use the same technique with a twist to return the day of week value given only the day’s name. To turn this into the appropriate day of the week number (that is, a value of 1 through 7), we divide by three, use the INT function on the result to return only the integer portion of the result, and then add one:

INT(LOCATE('SUNMONTUEWEDTHUFRISAT',DAY_NAME)/3) + 1;


Let’s use our previous example of Wednesday again. The LOCATE function returns the value 10. So, INT(10/3) = 3 and add 1 to get 4. And sure enough, Wednesday is the fourth day of the week. 

You can monkey with the code samples here to modify the results to your liking. For example, if you prefer Monday to be the first day of the week and Sunday the last day of the week, simply change the text string in the LOCATE function as follows:

LOCATE('MONTUEWEDTHUFRISATSUN',DAY_NAME)

My intent with this blog posting is actually twofold. Firstly, I wanted to share a method of sorting days-of-the-week... but secondly, I hope to also have piqued your interest in using SQL functions to achieve all types of different processing requirements. DB2 offers quite a lot of functions and it seems to me that many programmers fail to use functions to their fullest.

So if you are ever face-to-face with a difficult query request, step back and take a moment to consider how DB2 functions might be able to help!

Thursday, October 01, 2015

Understanding DB2 SELECT Syntax

Most DB2 programmers think they know how to correctly code simple SQL SELECT statements. And they usually are correct, as long as you keep that adjective “simple” in the assertion. When the statement requires more than SELECT...FROM…WHERE though, problems can ensue.

One of the biggest SELECT type of problem encountered by DB2 users is related to syntax. To paraphrase Mark Twain, sometimes what people think they know, just ain’t so.

How can you find the proper syntax for SELECT statements? The DB2 SQL Reference manual contains all of the syntax for DB2 SQL, but query syntax is separated from the rest of the language. Typically, users go to Chapter 5 of the SQL Reference that contains syntax diagrams, semantic descriptions, rules, and examples of the use of DB2 SQL statements. SELECT INTO is there, but SELECT is not in this section. Well, actually, there is a placeholder page that refers the reader back to Chapter 4. Chapter 4 contains the detailed syntax information and usage details for DB2 queries using SELECT. 


Another potentially confusing aspect of DB2 SELECT is the breakdown of SELECT into three sections: fullselect, subselect, and select-statement. This causes many developers to confuse which query options are available to the SELECT statements they want to code.

Let's take a look at each of these three sections:

First up is the select-statement, which is the form of a query that can be directly specified in a DECLARE CURSOR statement, or prepared and then referenced in a DECLARE CURSOR statement. It is the thing most people think of when they think of SELECT in all its glory. If so desired, it can be issued interactively using SPUFI. The select-statement consists of a fullselect, and any of the following optional clauses: WITH CTE, update, read-only, optimize-for, isolation, queryno and SKIP LOCKED DATA.

A CTE, or common table expression, defines a result table with a table-identifier that can be referenced in any FROM clause of the fullselect that follows. Multiple CTEs can be specified following a single WITH keyword. Each specified CTE can also be referenced by name in the FROM clause of subsequent common table expressions.

The next component is a fullselect, which can be part of a select-statement, a CREATE VIEW statement, a materialized query table, a temporary table or an INSERT statement. Basically, a fullselect specifies a result table. A fullselect consists of at least a subselect, possibly connected to another subselect via UNION, EXCEPT or INTERSECT. And ever since DB2 version 9 for z/OS, you can apply either or both ORDER BY and FETCH FIRST clauses. Prior to V9, this sometimes confused folks as they tried to put a FETCH FIRST n ROWS clause or an ORDER BY in a view or as part of an INSERT. That was not allowed! But it is now.

However, a fullselect does not allow any of the following clauses: FOR FETCH ONLY, FOR UPDATE OF, OPTIMIZE FOR, WITH, QUERYNO and SKIP LOCKED DATA. A fullselect specifies a result table – and none of these afore-mentioned clauses apply.

This sometimes confuses folks. I recently had a conversation with a guy who swore that at one point he created a view using the WITH UR clause and that it worked. It didn’t when we spoke and I’m sure it never did.

Finally, a subselect is a component of the fullselect. A subselect specifies a result table derived from the result of its first FROM clause. The derivation can be described as a sequence of operations in which the result of each operation is input for the next.

I know, this can all seem to be a bit confusing. But think of it this way: in a subselect you specify the FROM to get the tables, the WHERE to get the conditions, GROUP BY to get aggregation, HAVING to get the conditions on the aggregated data, and the SELECT clause to get the actual columns. In a fullselect you add in the UNION to combine subselects and other fullselects. Finally, you add on any optional clauses (as specified earlier) to get the select-statement.

Now what could be any easier?

Actually, it is not super easy. And if you add in some of the newer SQL capabilities, like OLAP functions or temporal time travel query clauses, it gets even more complicated.


I guess the bottom line is that you really should make sure you have the SQL Reference handy (see link above) if you are doing anything other than simple selecting… because you’ll probably need it.

Monday, September 14, 2015

Are You Heading to Las Vegas for IBM Insight?

The IBM Insight conference is coming up and if you work with DB2, Big Data, analytics, data warehousing, or really, anything at all about enterprise data, then Las Vegas is the place to be the week of October 25 thru 29, 2015.

But what is IBM Insight? Well, you may remember it as the IBM Information on Demand conference, or IOD for short, Yes, IBM has renamed the conference yet again. I'm sure a lot of you can remember when there were a bunch of different "Technical Conferences" like the IMS Tech Conference or the DB2 Tech Conference. Those conferences, as well as several others, all got rolled up into IOD... which is now IBM Insight.


So with that bit of potential confusion out of the way, why should you attend this year's IBM Insight conference? Simply put, there is something of interest for everybody in a data-related profession. 

The conference boasts more than 1600 presentations that run the gamut from DB2 to IMS to Cognos to BI to Big Data to analytics to... well, you get the idea. You can experiences technical training, hands-on labs, and industry use cases delivered by over 1000 industry experts.

There will be multiple keynote sessions, focusing in on IBM's important data initiatives including:
  • Data Management
  • Advanced Analytics
  • Cloud
  • Hadoop & Spark
  • Content Management
  • Watson
Last year's event was attended by over 13,000 folks, which gives attendees a great opportunity to network with your peers and IBMers. Add to that the over 350 exhibitors at the Expo hall and you will be able to view, review, and examine all kinds of interesting software to help you manage your enterprise data.

I also want to promote my presentation at this year's conference, called Not Your Daddy's DB2! I'll talk about the changing landscape of the industry and how DB2 for z/OS has changed (and continues to change) to embrace modern IT. This session will be held on Wednesday, October 28, at 4:00 pm in the South Seas J room). If you're going to the conference, I hope to see you at my presentation.

And, as always, there will be plenty of time to kick back and relax after a long day of networking and learning. On Wednesday evening conference attendees will be treated to a free concert from Maroon 5... and, of course, everything that Las Vegas has to offer can be on your agenda, too.

So what are you waiting for? Register for the 2015 IBM Insight conference today... 

And if you register by September 18th you can get a discounted rate ($300 off).

Tuesday, September 08, 2015

Mainframe Cost Optimization



Modern businesses live in an age of financial austerity, cost containment and cutbacks. It is just a fact of life in many organizations that you need to be constantly vigilant for new ways to reduce costs. If your business relies on the mainframe -- and many of the biggest businesses do -- cost containment is of the utmost importance.
But how to cut costs? Many mainframe support groups are running thin in terms of people - so layoffs don't make a lot of sense. And the software that runs the business can't be cut. Management software supports the business systems, so cutting those may cost more than you save!

But there are things you can do. If you're interested in learning more about IBM MLC software costs, pricing/licensing, mainframe cost optimization and a solution to dynamically manage your system to reduce software costs be sure to take an hour out of your busy schedule and join me for a free webinar titled Mainframe Cost Optimization: Pricing, Licensing, the R4HA, and More! 

I'll be delivering this webinar on September 10, 2015 at 2pm EDT.

During this session I'll discuss:
  • The new mainframe pricing options including zCAP, CMP and MWP
  • The disparate moving parts of sub-capacity pricing including the R4HA
  • Methods for controlling R4HA intelligently to reduce monthly software costs.

So click here to register for the webinar and join me on September 10th. 

Friday, September 04, 2015

Influencing the DB2 Optimizer: Part 7 - Miscellaneous Additional Considerations

In this 7th, and final installment of this series on influencing the DB2 optimizer's access path choices, we will take a look at a couple of additional things to consider as you work toward improving your SQL performance.

Favor Optimization Hints Over Updating the DB2 Catalog  

Optimization hints to influence access paths are less intrusive and easier to implement than changing data in the DB2 Catalog. However, that does not mean that you should use optimization hints all the time! Do not use optimization hints as a crutch to arrive at a specific access path. Optimization hints are best used when an access path changes and you want to go back to a previous, efficient access path.

Limit Ordering to Avoid Scanning  

The optimizer is more likely to choose an index scan when ordering is important (ORDER BY, GROUP BY, or DISTINCT) and the index is clustered by the columns to be sorted.

Maximize Buffers and Minimize Data Access  

If the inner table fits in 2% of the buffer pool, nested loop join should be favored. Therefore, to increase the chances of nested loop joins, increase the size of the buffer pool (or decrease the size of the inner table, if ­possible).

Consider Deleting Non-uniform Distribution Statistics

Sometimes non-uniform distribution statistics can cause dynamic SQL statements to fluctuate dramatically in terms of how they perform. To decrease these wild fluctuations, consider removing the non-uniform distribution statistics from the DB2 Catalog.

Although dynamic SQL makes the best use of these statistics, the overall performance of some applications that heavily use dynamic SQL can suffer. The optimizer might choose a different access path for the same dynamic SQL statement, depending on the values supplied to the predicates. In theory, this should be the desired goal. In practice, however, the results might be unexpected. For example, consider the following dynamic SQL statement:

SELECT   EMPNO, LASTNAME
FROM     DSN81010.EMP
WHERE    WORKDEPT = ?

The access path might change depending on the value of WORKDEPT because the optimizer calculates different filter factors for each value, based on the distribution statistics. As the number of occurrences of distribution statistics increases, the filter factor decreases. This makes DB2 think that fewer rows will be returned, which increases the chance that an index will be used and affects the choice of inner and outer tables for joins.

These statistics are stored in the SYSIBM.SYSCOLDIST and SYSIBM.SYSCOLDISTSTATS tables and can be removed using SQL DELETE statements.

This suggested guideline does not mean that you should always delete the non-uniform distribution statistics. My advice is quite to the contrary. When using dynamic SQL, allow DB2 the chance to use these statistics. Delete these statistics only when performance is unacceptable. (They can always be repopulated later using RUNSTATS.)

Collect More Than Just the Top Ten Non-uniform Distribution Statistics   

If non-uniform distribution impacts more than just the top ten most frequently occurring values, you should use the FREQVAL option of RUNSTATS to capture more than 10 values. Capture only as many as will prove to be useful for optimizing queries against the non-uniformly distributed data.

DB2 Referential Integrity Use

Referential integrity (RI) is the implementation of constraints between tables so that values from one table (the parent) control the values in another (the dependent, or child). A referential constraint between a parent table and a dependent table is defined by a relationship between the columns of the tables. The parent table’s primary key columns control the values permissible in the dependent table’s foreign key columns. For example, in the sample table, DSN8810.EMP, the WORKDEPT column (the foreign key) must reference a valid department as defined in the DSN8810.DEPT table’s DEPTNO column (the primary key).

You have two options for implementing RI at your disposal: declarative and application. Declarative constraints provide DB2-enforced referential integrity and are specified by DDL options. All modifications, whether embedded in an application program or ad hoc, must comply with the referential constraints. Favor using declarative RI as DB2 will then be aware of the relationship and can use that information during access path optimization.

Application-enforced referential integrity is coded into application programs. Every program that can update referentially-constrained tables must contain logic to enforce the referential integrity. This type of RI is not applicable to ad hoc updates.

With DB2-enforced RI, CPU use is reduced because the Data Manager component of DB2 performs DB2-enforced RI checking, whereas the RDS component of DB2 performs ­application-enforced RI checking. Additionally, rows accessed for RI checking when using application-enforced RI must be passed back to the application from DB2. DB2-enforced RI does not require this passing of data, further reducing CPU time.

In addition, DB2-enforced RI uses an index (if one is available) when enforcing the referential constraint. In application-enforced RI, index use is based on the SQL used by each program to enforce the constraint.

If you must use application RI instead of declarative RI, be sure to also define referential constraints with the NOT ENFORCED keyword. In that case, the constraints will not be enforced by DB2, but will be documented in the DDL. And it gives DB2 additional information that can be used by the Optimizer for query optimization.

Summary

Hopefully this 7-part series on influencing DB2 access paths provided you with a nice overview of the options available to you and considerations for their use. If you are interested in learning more about SQL tuning and DB2 performance, consider purchasing the book from which this series was drawn: DB2 Developer's Guide 6th edition.

Happy SQL performance tuning!