Tuesday, June 14, 2016

Four Important Buffer Pool Tuning Knobs in DB2 for z/OS

DB2 has five (well, four current) primary adjustable thresholds that can be modified using the ALTER BUFFERPOOL command.  

These thresholds are as follows:

The Sequential Steal Threshold, or VPSEQT, is the percentage of the buffer pool that can be occupied by sequentially accessed pages. For example, at the default value of 80, when this threshold is reached, 80% of the buffer pool represents pages for sequential processing. Of course, 80% is just the default; you can modify this value based on your processing needs to any value ranging from 0 to 100. When this threshold is reached, DB2 will steal a sequential page first before stealing a page from a randomly accessed page. So, for data that is accessed mostly sequentially (for example, through scans and prefetching) consider increasing the value of this parameter, and for data that is accessed most randomly, consider decreasing the value of this parameter. A VPSEQT value of zero will prevent any sequential pages from lingering in the buffer pool and it will turn off sequential prefetch.  A VPSEQT value of 100 allows the entire buffer pool to be monopolized by sequential pages.

The next tunable buffer pool threshold is the Parallel Sequential Threshold, or VPPSEQT. This threshold indicates the amount of the buffer pool that can be consumed by sequentially accessed data for parallel queries. When this threshold is reached, DB2 will cease to steal random pages to store sequential pages accessed by parallel queries. The default value for VPPSEQT is 50%, indicating its size as 50% of the sequential steal threshold (VPSEQT). For example, if the buffer pool is defined as 1000 pages and VPSEQT is set at 80%, a query using I/O parallelism can consume up to 400 sequential pages (that is, 1000 x 80% = 800 for the sequential steal threshold and 800 x 50% = 400 for the parallel sequential threshold).

The third, and final sequential threshold is the Assisting Parallel Sequential Threshold (or VPXPSEQT). This threshold is no longer supported as of DB2 11 because Sysplex Parallelism is no longer supported. When it was available, VPXPSEQT was used to indicate the portion of the buffer pool that might be used to assist with parallel operations initiated from another DB2 in the data sharing group. 

The final two modifiable DB2 buffer pool thresholds are used to indicate when modified data is to be written from the buffer pool to disk. Log data is externalized when a COMMIT is taken, but writing of the actual data itself is controlled by the two deferred write thresholds (and DB2 system checkpoints).

First we have the Deferred Write Threshold (or DWQT). When DWQT is reached, DB2 starts scheduling write I/Os to externalize the data pages to disk. By default, the deferred write threshold is reached when 30% of the buffer pool is allocated to unavailable pages, whether updated or in use. The default is probably too high for most larger buffer pools.

DB2 also provides the Vertical Deferred Write Threshold (VDWQT), which is basically the same as DWQT but for a single page set. By default, VDWQT is reached when 5% of the buffer pool is allocated to one data set. When reached, DB2 will start scheduling write I/Os to externalize the data pages to disk. Once again, this default is most likely too high for most shops.

The VDWQT threshold can be specified as a percentage of the buffer pool, or as an absolute number of buffers. When you want to specify a relatively low threshold for VDWQT using an absolute number of buffers provides better granularity and control.

In general, consider ratcheting the deferred write thresholds down to smaller percentages (from the defaults) for most of your buffer pools. Doing so enables “trickle” write from the DB2 buffer pools. This means that the data is written asynchronously to disk regularly over time in smaller amounts, instead of storing up a lot of modified data that has to be written all at once when the threshold percentage is reached. Of course, the needs of every shop will vary.

And yes, there are other buffer pool tuning options other than these 5  4 parameters, such as changing the size of the pool, specifying min/max size for BP expansion/contraction, or altering the page most often used to fine tune buffer pool operations.

All of the above thresholds can be changed using the -ALTER BUFFERPOOL command. 

Wednesday, June 01, 2016

Carefully Code Your DB2 LIKE Predicates

The LIKE predicate is a powerful but potentially problem-causing operator that can be used to great effect in your SQL development efforts.

The LIKE predicate searches for strings that match a specified pattern. Here is the definition from the DB2 SQL Guide:

Read syntax diagram
>>-match-expression--+-----+--LIKE--pattern-expression---------->
                     '-NOT-'                             

>--+---------------------------+-------------------------------><
   '-ESCAPE--escape-expression-'   

But what does this mean? Well, the match-expression is the string to be tested for conformity to the pattern specified in pattern-expression. You can use the underscore and the percent sign characters as wildcards in the pattern to indicate 1 (underscore) or many (percent sign) indeterminate characters. 

The ESCAPE clause is used when you want to actually search for one of the wildcard characters.
But I do not really want to get into explaining the basics of how LIKE works here. If you really need more details on LIKE I refer you to the appropriate IBM Knowledge Center details.

The Semantics of LIKE
What I do want to do today is to give some advice on LIKE usage. First of all, be careful in terms of how you use the wildcard characters (underscore and percent sign).  The '_' character requires a matching character and the '%' character does not.  This can produce interesting results. For example, the following two queries are not equivalent, though at first glance you might think they were:

SELECT *
FROM   cust
WHERE (lastname LIKE 'Peter_%');

SELECT *
FROM   cust
WHERE (lastname LIKE 'Peter%');

Both will match to 'Peterson' and 'Peters', but the first will not match to 'Peter' because of the underscore. 

The LIKE predicate offers a great deal of flexibility and power to your SQL statements. Using LIKE you can quickly retrieve data based on patterns and wildcards. However, some uses of LIKE can be confusing to implement appropriately—especially when LIKE is used with host variables.

Let’s assume that you need to create an application that retrieves customers by last name, but the supplied value for last name can be either the entire name or just the first few bytes of that name. In that case, the following query can suffice:

SELECT custno, firstname, lastname
FROM   cust
WHERE  lastname LIKE :host_variable;

In order for this to work, when you enter the value for host_variable be sure to append percent signs (“%”) to the end of the value. The percent sign specifies that DB2 should accept as a match any number of characters (including 0). This must be done programmatically. So, if the value entered is SM, the host_variable should contain “SM%%%%%%%%” and if the value entered is SMITH, the host_variable should contain “SMITH%%%%%”. Append as many percent signs as required to fill up the entire length of the host variable. Failure to do so will result in DB2 searching for blank spaces. Think about it—if you assign “SMITH%” to a 10-byte host variable, that host variable will think it should search for “SMITH%”, that is SMITH at the beginning, four blanks at the end, and anything in the middle.

So, for “SMITH%%%%%”, SMITH will be returned, but so will SMITHLY (or any name beginning with SMITH). There is no way to magically determine if what was entered is a complete name or just a portion thereof. If this is not acceptable, then a single query will not likely be feasible. Instead, you would have to ask the user to enter whether a full name or just a portion is being entered.

What About Performance?

Notwithstanding the semantic details, there are performance considerations to understand when using LIKE, too. It is a good practice to avoid using the LIKE predicate when the percentage sign (%) or the underscore (_) appears at the beginning of the comparison string because they prevent DB2 from using a matching index. 

The LIKE predicate can produce efficient results, however, when you use the wildcard characters at the end or in the middle of the comparison string, for example:

InefficientCan be efficient with index
LIKE '%NAME' LIKE 'NAME%'                         
LIKE '_NAME'          LIKE 'NA_ME'


DB2 will not use direct index lookup when a wildcard character is supplied as the first character of a LIKE predicate. At bind time, DB2 cannot determine when a host variable contains a wildcard character as the first character of a LIKE predicate. The optimizer therefore does not assume that an index cannot be used; rather, it indicates that an index might be used. At runtime, DB2 determines whether the index will be used based on the value supplied to the host variable. When a wildcard character is specified for the first character of a LIKE predicate, DB2 uses a non-matching index scan or a table space scan to satisfy the search.

Summary

The LIKE operator brings powerful search capabilities to your DB2 SQL queries. Be sure to understand its capabilities and to use it appropriately in your development efforts.