The Db2 Portal Blog

Wednesday, July 31, 2013

Compression Becoming More Important in the Age of Big Data

DBAs and database professionals have been aware of the pros and cons of compressing data for years. The traditional argument goes something like this: with compression you can store more data in less space, but at the cost of incurring CPU to compress the data upon insertion (and modification) and decompress the data upon reading it. Over time, the benefits of compression became greater as compression algorithms became more robust, hardware assist chips became available to augment compression speed, and the distributed model of computing made transmitting data across networks a critical piece of the business transaction (and transmitting compressed data is more efficient than transmitting uncompressed data).

IBM has significantly improved compression in DB2 for z/OS over the years. In the early days of mainframe DB2 no compression capability came with DB2 out-of-the-box -- the only mechanism for compressing data was via an exit routine (EDITPROC). Many software vendors developed and sold compression routines for DB2. Eventually, IBM began shipping a sample compression routine with DB2. And then in DB2 Version 3 (1993) hardware-assisted compression was introduced. Using the hardware assist , the CPU used by DB2 compression is minimal and the cons list gets a little shorter.

Indeed, one piece of advice that I give to most shops when I consult for them is that they probably need to look at compressing more data than they already are. Compressed data can improve performance these days because, in many cases, you can fit more rows per page. And therefore scans and sequential processes can process more data with the same number of I/Os, thereby improving performance. Of course, you should use the DSN1COMP utility to estimate the amount of savings that can accrue via compression before compressing any existing data.

Eventually, in DB2 9 we even get index compression capability (of course, using different technology than data compression). At any rate, compressing data on DB2 for z/OS is no longer the “only-if-I-have-to” task that it once was.

Then along comes the Big Data phenomenon where increasingly large data sets need to be stored and analyzed. Big Data is typified by data sets that are so large and complex that traditional tools and database systems are ill-suited to process them. Clearly, compressing such data could be advantageous… but is it possible to process and compress such large volumes of data?

New alternatives to traditional systems are being made available that offer efficient resource usage based on principles of compressed sensing and other techniques. One example of this new technology is IBM’s BLU Acceleration, which is included in DB2 10.5 for Linux, Unix, and Windows. One feature of BLU Acceleration is extended compression, which eliminates the need for indexes and aggregation and operates on compressed data and can thereby eliminate the CPU time that would be required to decompress the data. Advanced encoding maximizes compression while preserving the order of encoding so compressed data can be quickly analyzed without decompressing it. It is an impressive technology as no changes are required to your existing SQL statements.

IBM reports that some clients using DB2 10.5 for LUW with BLU Acceleration have achieved compression rates 10 times greater than uncompressed tables.

Of course, BLU Acceleration is much more than compression (it combines in-memory, columnar and compression technologies), but for the purposes of today’s blog entry we won’t delve deeper into the technology. If you are interested in a little bit more on BLU read my high-level overview in my coverage of this year’s IDUG DB2 TechnicalConference.

So compression is becoming cool… who’d have thought that back in the 1980s when compression was something we only did when we absolutely had to?

Friday, July 26, 2013

Friday, July 19, 2013

DB2 Locking, Part 17: In Conclusion

Today's blog post concludes our multi-part series on DB2 for z/OS locking and concurrency. We have touched on a great many aspects of locking in this series. Such an in-depth, technical topic as DB2 locking can be difficult to master, but doing so can deliver a more clear understanding of how DB2 operates and how your programming and design decisions impact not only your application, but also the entire DB2 subsystem.

A Couple Additional Locking Ideas and Thoughts

Before concluding this series, here are a few more guidelines and thoughts on DB2 locking and concurrency:

It is a good idea to use clustering to encourage DB2 to maintain data that is accessed together on the same page. If you use page locking, fewer locks are required to access multiple rows if the rows are clustered on the same page or pages. You can also use larger page sizes to control the amount of data locked for page locking.
Consider using the free space parameters to influence locking. If you increase free space, fewer rows are stored on a single page. Therefore, fewer rows are locked by a single page lock. This approach can decrease contention. However, it consumes additional disk storage, and it can also decrease the performance of table space scans (because more pages with fewer rows must be read). Additionally, keep in mind that as data is added to the table the free space will decrease (because the new data is using it). As such, locking issues may become more prevalent.

You can also decrease the number of rows per page using the MAXROWS option of the CREATE TABLESPACE statement. The fewer rows per page, the less intrusive page locking will be because fewer rows will be impacted by a page lock. This approach is probably better than the free space approach (previous bullet) because new data will not impact number of rows per page.

Design your application programs with locking considerations in mind. THis is the Number One thing to remember to increase concurrency and minimize the impact of locking on DB2 application and system performance. You can minimize the effect of locking through proper application program design. This means:

Limiting the number of rows that are accessed by coding predicates to filter unwanted rows

Requesting only the data (rows and columns) that your actually need to perform your business processes

Perform modifications as close to the end of the unit of work as possible

And remember to avoid bachelor programming syndrome (see Part 9).

Summary

Of course, there are probably many more hints, tips, and guidelines for developing DB2 databases and applications with concurrency in mind, but I think a 17 part series is sufficient for my blog. If you want more details on concurrency (or any other aspect of DB2 for z/OS) might I recommend the latest edition of my book -- DB2 Developer's Guide, 6th edition.

And be sure to come back and review this series on locking if you get confused as you work to maximize the concurrency of your DB2 queries, transactions, and programs.

Finally, as a service to my readers, this post includes a directory/index to the 16 separate posts that make up this series.

Index of Blog Posts on DB2 Locking

DB2 Locking, Part 1: An Overview

DB2 Locking, Part 2: Table Space and Table Locks

DB2 Locking, Part 3: Locks Versus Latches

DB2 Locking, Part 4: Page and Row Locks

DB2 Locking, Part 5: Lock Suspensions, Timeouts, and Deadlocks

DB2 Locking, Part 6: Claims, Drains, and Partition Independence

DB2 Locking Part 7: Lock Avoidance, Related Issues, and Stuff

DB2 Locking, Part 8: LOBs and Locking

DB2 Locking, Part 9: Avoid Bachelor Programming Syndrome

DB2 Locking, Part 10: Know Your ISOLATION Levels

DB2 Locking, Part 11: Data Sharing Global Lock Management

DB2 Locking, Part 12: Lock Promotion and Escalation

DB2 Locking, Part 13: Optimistic Locking

DB2 Locking, Part 14: Using the LOCK TABLE Statement

DB2 Locking, Part 15: Tackling Timeout Troubles

DB2 Locking, Part 16: Skipping Uncommitted Inserts

Monday, July 15, 2013

DB2 Locking, Part 16: Skipping Uncommitted Inserts

DB2 10 for z/OS introduces a new technique for concurrency. As we have discussed previously in this series on DB2 locking, one of the most troublesome problems for DB2 performance analysts is reducing timeouts and lock suspensions.

The CONCURRENTACCESSRESOLUTION parameter can be used to specify a concurrent access resolution option to use for statements in a package when binding your application program.

There are two options. The first, USECURRENTLYCOMMITTED, indicates that DB2 can use the currently committed version of the data for applicable scans when data is in the process of being updated or deleted. Rows that are in the process of being inserted can be skipped. This clause applies only when the isolation level in effect is Cursor Stability (CS) or Read Stability (RS) making skip uncommitted inserts apply. It is ignored for other isolation levels.

The second option is WAITFOROUTCOME, which indicates that applicable scans must wait for a COMMIT or ROLLBACK operation to complete when data is in the process of being updated or deleted. Rows that are in the process of being inserted are not skipped.

The default behavior is WAITFOROUTCOME. Instead, if you choose to specify USECURRENTLYCOMMITTED DB2 will ignore rows that are in the process of being inserted and use only currently committed rows. This might be desirable for highly concurrent web transactions or to mimic the application concurrency settings or behavior of another DBMS.

Another option at your disposal is to specify that uncommitted inserts are to be skipped at the subsystem level (using option 19 on panel DSNTIPB during DB2 installation or by assembling a new DSNZPARM). The SKIPUNCI subsystem parameter specifies whether statements ignore a row that was inserted by another transaction if the row has not if the row has not yet been detected as committed. A newly inserted row can be detected as committed only after the lock held on the row has been released.

There are two options for this parameter:

Specifying YES for SKIPUNCI will cause DB2 to behave as though the newly inserted row has not yet arrived and the row is skipped, until the lock held on a newly inserted row is released.
SKIPUNCI(NO) indicates that DB2 will wait for the inserted row to be committed or rolled back. It then processes the row if the insert commits, or it moves on to find another row if the insert is rolled back. If a transaction performs one or more inserts, and then spawns a second transaction, specify NO for SKIP UNCOMM INSERTS if the first transaction needs the second transaction to wait for the outcome of the inserts. This is the default value.

By using either of these two methods to skip uncommitted inserts you can improve concurrent access to data as you tune your application’s locking and concurrency requirements.

Tuesday, July 09, 2013

DB2 Locking, Part 15: Tackling Timeout Troubles

Many shops battle with locking issues and frequently, the cause of performance issues can be traced back to locking issues, more specifically, lock timeout issues. When you experience a timeout, it means that another process holds a lock on the data that you are trying to modify.

You may experience timeout troubles as the dreaded -911 SQLCODE. Timeouts are different than most performance issues because the job or users receiving the -911 SQLCODE is usually not the source of the problem. Instead, the jobs causing the problem typically continue to run just fine, bringing about problems for everybody else!

The primary factors to examine when looking to resolve timeout issues are:

The lock size parameter that was specified at table space creation time.
The duration of the locks established by the BIND strategies. Lock duratiuon is determined by a combination of the ACQUIRE and RELEASE strategies for the table spaces and the ISOLATION parameters for the page locks.
The use of LOCK TABLE statements in any programs (See DB2 Locking Part 14).
The scope of the commit in the application code.
The manner in which the table space was started (RW or RO).

Most of the time, resource contention and timeouts are due to long-running programs that are not issuing COMMITs…or perhaps are not committing their work frequently enough. Issuing a COMMIT ends a unit of recovery and hardens all database modifications made during that unit of recovery.

So be sure to verify that all of your batch processes -- especially any that run concurrently with other workload (but really all batch process) -- have a COMMIT strategy. This means that your programs should issue a COMMIT after processing "a set number of" inserts, updates, and deletes. A COMMIT will tell DB2 to make the changes permanent and releases locks. A good rule of thumb is to strive for having no lock held for more than five seconds.

A reasonable approach is to set a counter that is incremented after every modification. Then, check it and when it exceeds a predefined threshold -- say 25 or 50 or 100 modifications -- then issue a COMMIT. You should make the threshold an input parameter so that you can change it as the workload in your system changes. For example, make it 25 when concurrent activity is high, but ramp it up to 100 or higher when it is low. Failure to issue COMMITs will result in timeouts, as well as possibly deadlocks and lock escalation.

For a more in-depth discussion on the importance of issuing COMMITs in your application programs review DB2 Locking Part 9.

Another practical approach for reducing lock duration is to make data modifications as close to the end of the unit of work as possible. Look at all of your programs, both batch and online, and try to save the data modification statements to as close to the COMMIT as you can. By saving the data modification until right before you issue a COMMIT, you reduce the overall average lock duration. This should result in reducing contention and therefore, the number of timeouts.

Some additional suggestions to maximize concurrency and reduce timeouts include:

Use WHERE CURRENT OF CURSOR to perform UPDATEs and DELETEs in your programs.
Ensure that you have created your table spaces with the appropriate LOCKSIZE (usually PAGE, but sometimes perhaps ROW).
If you have tables that are static, consider starting their table space as RO instead of RW. Doing so can enable DB2 to utilize table locking instead of page or row locking for those tables.
Limit the scope of SQL statements so that a statement locks 150 or fewer pages on a table for each unit of work that is performed.

Finally, if you want to investigate timeout details, be sure to examine the statistics trace class 3 and IFCID 0196 for timeouts (IFCID 0172 is for deadlocks).