Tuesday, February 25, 2014

Dynamic SQL - Let's Do The Math

We've come a long way in the world of DB2 in the past decade or so. I remember way back when it was common for DBAs to say "If performance is an issue, dynamic SQL should not be used... always use static SQL!"  But today, in 2014, it is usually the case that dynamic SQL is the predominant form of new development.

Now a lot of things have changed to make this the case. Particularly that most new applications are being developed for distributed and web applications, instead of traditional mainframe, COBOL applications. And dynamic SQL is the default way to access DB2 from these type of apps.

But you know what? Even if you are developing traditional mainframe COBOL programs, dynamic SQL can be a better solution for you.

The Basics

Before we go on, let's tackle a few of the basics. What makes dynamic SQL different than static SQL?  Well, static SQL is optimized prior to program execution.  Each and every static SQL statement in a program is analyzed and optimized during the DB2 Bind process.  During this process the best access path is determined and coded into your DB2 package.  When the program is executed, the pre-formulated access path is executed.

Dynamic SQL, on the other hand, is optimized at run time.  Prior to the dynamic SQL statement being executed, it must be processed by the DB2 Optimizer so that an optimal access path can be determined.  This is called the Prepare process.  Prepare can be thought of as a dynamic Bind. 

We will not go into the details of dynamic statement caching and its ability to improve dynamic SQL performance here. Suffice it to say, dynamic SQL these days can be tuned using caching. For additional details on dynamic statement caching (and REOPT parms) check out my article, Dynamic SQL Performance, on TOAD World.

Now let's turn our attention to traditional dynamic SQL development. There are four types of dynamic SQL:
  • EXECUTE IMMEDIATE
  • Non-SELECT
  • Fixed-List SELECT
  • Varying-List SELECT

EXECUTE IMMEDIATE dynamic SQL ­will (implicitly) prepare and execute complete SQL statements embedded in host-variables.  Its drawbacks are that it can not be used to retrieve data using the SELECT statement and the PREPARE is implicit within the EXECUTE IMMEDIATE; so optimization must occur every time the statement is executed.

Non-SELECT Dynamic SQL can be used to explicitly prepare and execute SQL statements in an ­application program.  The PREPARE and EXECUTE are separated so that once a statement is prepared, it can be executed multiple time without re-optimization.  However, as its name implies, Non-SELECT dynamic SQL can not ­issue the SELECT statement. 

Fixed-List SELECT can be used to explicitly prepare and execute SQL SELECT statements ­where the exact columns to be retrieved are always known in advance.  The columns to be retrieved must be known at the time the program is being coded and they can not change during execution.  This is necessary ­in order to create the proper working-storage declaration for ­host-variables in your program. 

If you do not know in advance ­the exact columns that are to be accessed, you can use Varying-List SELECT dynamic SQL.  In this case, pointer variables are used to maintain the list of selected columns.  Although Varying-List SELECT is the most complicated type of dynamic SQL, it also provides the most flexibility for dynamic SELECT statements.  Changes can be made "on the fly" to tables,­ and columns, and predicates.  Because everything about the query­ can change during one invocation of the program, the number and­ type of host-variables needed to store the retrieved rows cannot­ be known beforehand.  This will add considerable sophistication and complexity to­ your application programs.

Mathematical Reason to Reconsider Dynamic SQL

Even if the decreasing cost of dynamic SQL and the newer performance improvements like dynamic statement caching do not compel you to use dynamic SQL, there is at least one situation where dynamic SQL should almost always be chosen over static SQL:  when numerous combinations of predicates can be chosen by a user at run-time.

Consider the following:  What if, for a certain query, there are twenty possible predicates.  The user of the program is permitted to choose up to six of these predicates for any given request.  How many different static SQL statements need to be coded to satisfy these specifications?

First, let's determine the number of different ways that you can choose six predicates out of twenty.  To do so we need to use combinatorial coefficients.  So, if n is the number of different ways then:

            n = (20 x 19 x 18 x 17 x 16 x 15) / (6 x 5 x 4 x 3 x 2 x 1)

            n = (27,907,200) / (720)

            n = 38,760

38,760 separate static SELECTs is quite a large number, but this is still not enough!  This number shows the total number of different ways we can choose six predicates out of twenty if the ordering of the predicates does not matter (which for all intents and purposes it does not)[1].  However, since the specifications clearly state that the user can choose up to six, we have to modify our number.  This means that we have to add in:
  • the number of different ways of choosing five predicates out of twenty
  • the number of different ways of choosing four predicates out of twenty
  • the number of different ways of choosing three predicates out of twenty
  • the number of different ways of choosing two predicates out of twenty
  • the number of different ways of choosing one predicate out of twenty


Figure 1.  Combinatorial Coefficients Calculations


Ways to Choose Six Predicates Out of Twenty

            (20 x 19 x 18 x 17 x 16 x 15) / (6 x 5 x 4 x 3 x 2 x 1) = 38,760

Ways to Choose Five Predicates Out of Twenty

            (20 x 19 x 18 x 17 x 16) / (5 x 4 x 3 x 2 x 1) = 15,504

Ways to Choose Four Predicates Out of Twenty

            (20 x 19 x 18 x 17) / (4 x 3 x 2 x 1) = 4,845

Ways to Choose Three Predicates Out of Twenty

            (20 x 19 x 18) / (3 x 2 x 1) = 1,140

Ways to Choose Two Predicates Out of Twenty

            (20 x 19) / (2 x 1) = 190

Ways to Choose One Predicate Out of Twenty

            20 / 1 = 20

Total Ways to Choose Up To Six Predicates Out of Twenty

            38,760 + 15,504 + 4,845 + 1,140 + 190 + 20 = 60,459


This brings the grand total number of static SQL statements that must be coded to 60,459.  The calculations are shown in Figure 1.  In a situation like this, if static SQL is being forced upon us, we have one of two options:

1.   code for forty days and forty nights hoping to successfully write 60,459 SQL statements
2.   compromise on the design and limit the users flexibility

I can guarantee that 99.99% of the time the second option will be chosen.  My solution would be to abandon static SQL and use dynamic SQL in this situation.  How would this ease the development situation?  Consider the following:
  • With dynamic SQL, the twenty predicates need be coded only once (in working storage)
  • As the program runs, the application logic can build the complete SQL statement based upon user input
  • The size of the DBRM will decrease dramatically.  The DBRM for the static SQL program will be huge if it contains all of the 60,459 SQL statements.  Even if a compromise number is reached, chances are that the DBRM will be large.  And guaranteed it will be larger than the DBRM for the dynamic SQL program.
  • Although there will be additional run-time overhead to build the SQL and perform the dynamic Prepare, performance may not suffer. Queries issued against non-uniform data, may actually experience improved access paths and perform better.

So When Should You Seriously Consider Dynamic SQL?
  • When the nature of the program is truly changeable, such as the example given in the text above.
  • When the columns to be retrieved can vary from execution to execution.  This is similar to the example given where multiple combinations of predicates can be chosen, but in this case, multiple combinations of columns can be chosen.
  • When benefit can be accrued from interacting with other dynamic SQL applications.  
  • When the SQL must access non-uniform data.
You can find some additional guidance for helping you to evaluate when to use dynamic versus static SQL in my Data Perspectives column Dynamic vs. Static SQL.

Synopsis

Dynamic SQL is not always bad... and it is already pervasive in distributed and web applications.  In this day and age, dynamic SQL should be considered as a viable option even for traditional mainframe applications that are not distributed or web-based.





[1] It is true that for performance reasons you may want to put the predicate with the highest cardinality within each type of operator first, but we will not concern ourselves with this in this blog post.

Monday, February 17, 2014

Rebinding for Optimal DB2 Access Paths

The access paths formulated by the DB2 optimizer during the BIND and REBIND processes are critical to your application performance. It is these access paths that determine not only how DB2 data is accessed by your program, but how efficiently it is accessed. Whether you are preparing a new program, implementing changes into your existing DB2 applications, upgrading to a new version of DB2, or simply trying to achieve optimum performance for existing applications, an exhaustive and thorough REBIND management policy should be of paramount importance.

However, many organizations are not doing everything possible to keep access paths up-to-date with the current state of their data. So what is the best practice approach for rebinding your DB2 programs? The answer is “The Five R’s.” This methodology of regular rebinding followed by a review of your access paths required the following steps:

  1.       Start with an inspection of the RTS (Real Time Statistics) to determine which objects need to be reorganized.
  2.       Follow that up by running a REORG on table spaces and indexes as appropriate based on the statistics.
  3.       After reorganizing, run RUNSTATS (to ensure the DB2 Catalog is up-to-date).
  4.       Follow that with REBINDs of your programs.
  5.       Then we need that fifth R – which is to review the access paths generated by the REBIND.
For shops that have avoided rebinding for years this approach represents a significant change. So what new DB2 features are available to help? Well, several releases ago, back in DB2 9 for z/OS, plan stability was added. This feature enables you to save a backup version of your access paths as a precautionary measure. If any of the new access paths after rebinding are less efficient, the DBA can switch back to the backed up access paths. In order to implement this level of stability you can use the PLANMGMT parameter of the REBIND command. There are three options: NONE, BASIC, and EXTENDED. BASIC saves the previous access paths, and EXTENDED saves the previous and an original. You can use REBIND and the SWITCH parameter to revert back to the saved access paths when the new access paths cause degraded performance. 

As of DB2 10 for z/OS you can tell DB2 to try to reused previous access paths for SQL statements whenever possible. This is called access path reuse and is implemented using the APREUSE bind option. When invoked, DB2 uses information about the previous access paths to create a hint.

When BIND PACKAGE or REBIND PACKAGE specifies APREUSE(ERROR), DB2 tries to locate the access path information from a package that has a matching identity. If no such package exists, DB2 tries to locate another recent version of the package that has the matching location, collection ID, and name. The APREUSE option only applies to statements that have identical statement text in both packages. Newly added statements and statements with text changes never reuse previous access paths.
Reusing access paths can help to minimize administrative and performance issues involved in rebinding.

Of course, there are products on the market which can be used to implement a proactive approach to rebinding. These products preview the new access paths and then run them through a rules system to determine if the new access will be improved, unchanged, or degraded. With this information we can rebind everything that would improve and avoid rebinding anything else until we can analyze the cause of the degradation. Using such an approach you should not have degraded access paths sneaking into your production environment.

Summary

At any rate, a systematic approach to DB2 binding and rebinding is necessary to assure optimal performance within your DB2 applications. This short blog entry covers some of the basics and recent changes to DB2 in this area. 

Be sure to take your time and to plan your statistics-gathering and bind/rebind approach... or be ready to be in constant firefighter mode as you address poorly-performing SQL in your applications!


Sunday, February 09, 2014

Those Good Old IBM Mainframe Utility Programs

Most mainframe programmers are aware that IBM supplies many utility programs that are useful for system maintenance chores such as copying and deleting files, manipulating partitioned data sets, and the like.  

These utilities typically begin with an IEB, IEF, or IEH prefix.  One of the most common of these is IEFBR14, which is used to catalog a data set.  But few people are aware, however, that IBM also supplies many other utilities that can make the life of a programmer much easier.  Below is a select list of these:

IEBMAGIC        fixes any problem without having to use your brain; perfect for consultants and contract programmers

IEBIQUIT         automatically updates your resume, writes a letter of resignation, forwards it to your boss, and prints reams of paper to make it look like you actually did something when you worked here

IEBIBALL         compares any source file to the last one that actually worked displaying all changes and who made them; perfect tool for technical support personnel overwhelmed by programmers chanting the phrase "I didn't change anything"

IEBPANIC         if all else fails, run IEBPANIC; sometimes it fixes your problem and sometimes it doesn't, but it never tells you how it did it; companion program to IEBMAGIC;

IEBNOTME      alters all trace of your userid from system change logs, SMF, etc.; useful to eliminate finger-pointing; this should always be run before running IEBIQUIT

IEFINGER        when designing on-line systems sometimes intermediate screens are sent that serve no purpose other than to tie together a series of related transactions; these intermediate screens generally direct the user to "Press ENTER to Continue"; IEFINGER simulates the end user pressing the ENTER key thereby eliminating unnecessary screens

IEHAMMER      forces a square peg into a round hole; for example, if you try to compile a COBOL program using the FORTRAN compiler, attaching IEHAMMER to the compile will make it work

IEBPIG               finds all unused resources of any type and assigns them to any specified job

IEBHAHA         randomly changes source code in test libraries; it is often speculated that IEBHAHA is the cause of most program problems that IEBIBALL is used to correct

IEBEIEIO          run this utility when you have too many problems to handle for one person;  it corrects the old "with an abend here, and a meeting there, e-i-e-i-o" syndrome by causing a system problem so large (in someone else's application) that all direction is diverted away from you to them

So did I forget your favorite?  Drop a comment below to share it!

Saturday, February 01, 2014

The Twelve DBA Rules of Thumb... a summary

Over the past couple of months this blog has offered up some rules of thumb for DBAs to follow that can help you to build a successful and satisfying career as a DBA. These twelve rules of thumb worked well for me as I worked my way through my career and I have shared them with you, my faithful readers, so that you can benefit from my experiences. I hope you find them useful... and if I have missed anything, please post a comment with your thoughts and experiences on being a good DBA.


As a reminder of what we have discussed, I am posting a short synopsis of the Twelves DBA Rules of Thumb here, along with links to each blog post.

 1. Write Down Everything
 2. Automate Routine Tasks
 3. Share Your Knowledge
 4. Analyze, Simplify and Focus
 5. Don't Panic!
 6. Be Prepared
 7. Don't Be a Hermit
 8. Understand the Business, Not Just the Technology
 9. Ask for Help When You Need It
10. Keep Up-to-Date
11. Invest in Yourself
12. Be a Packrat


Good luck with your career as a DBA...