The Db2 Portal Blog: UNLOAD

Showing posts with label UNLOAD. Show all posts

Thursday, December 17, 2020

Db2 Utilities and Modern Data Management

Db2 utilities are the unappreciated, and often over-looked, workhorses of your mainframe Db2 environment. They perform the dirty work that has to be done to populate, organize, backup, and recover your vital mainframe data. Without them, building effective Db2 databases, managing data, optimizing performance, and even accessing mainframe data would be a lot more difficult than it currently is.

The Situation

Think about the Db2 utility situation at your shop. If you are like most organizations you will have Db2 utilities regularly running all the time. There are load and unload tasks running to refresh data for development and testing, for moving data between environments for analysis and processing, and for various other purposes. The LOAD and UNLOAD utilities bear a lot of the hard work for data movement.

You are also most likely reorganizing data using a REORG utility for most of your Db2 table spaces and probably indexes, too. In many cases reorganization jobs are scheduled to run on a regular basis: weekly, monthly, quarterly, etc. Frequently you just set these jobs up when the object is created. The job gets scheduled and is just run without anybody taking a look at them unless, or until there are performance problems.

Then there are COPY and RECOVER utilities for backing up and recovering data when there are problems. The image copy backup jobs are running all the time, taking either full or incremental copies to ensure that you can recover data in case problems are encountered. The copies are running all the time, but the recover jobs (hopefully) are not running all the time!

You are also going to be running the RUNSTATS utility to gather statistics for Db2 to use for query optimization. Depending on how often your data changes, you may be running RUNSTATS frequently or infrequently. Many times the same fate as REORG befalls RUNSTATS… that is, it is scheduled and forgotten about unless problems arise.

There are other utilities, like CHECK which is used to verify the integrity of data. You are probably not running this one very often but when you need it you want it to run fast, right?

So, all of these utilities are “out there” running and consuming CPU to move, copy, and manage your Db2 data. But are they being run effectively as possible?

Moving to the Modern Db2 Utility Way

I think by this point everybody will agree that utility type processing is not just critical, but mandatory for a Db2 environment. But just running with the bare basics is not the best approach.

If we think about data movement with unload and load processing there are several things that you might want to consider for improvement. First of all, consider the speed and performance of the unload and load tasks. You probably want these jobs to run as fast as possible – that is, to consume as little elapsed time as possible to complete. After all, you are probably using these utilities to build environments or even refresh portions of an environment… and there will be developers and testers waiting to use that data as soon as it is available. Using the fastest utility programs available will minimize the wait time and make your developers and testers more productive. Furthermore, you want these tasks to consume as little CPU as possible to reduce your monthly mainframe bills!

In some cases you might want to re-consider unloading and loading altogether, using alternate utilities and offerings that can clone an entire subsystem or move data outside the control of Db2 at the data set level.

If we think about reorganization, it is likely that you are running REORG tasks that don’t need to be run, at least not as regularly as they are being run. At the same time, it is also likely that you are not running other REORG tasks as frequently as you should, thereby causing every other task that accessing the data to degrade. Fortunately, you can use RTS (real time statistics) to help guide when you should (and should not) reorganize your data. In the best case the utility itself relies on RTS to figure out if it needs to run and runs when it makes sense only. Failing this, you are again likely consuming more CPU than is necessary (either running unneeded REORGs or accessing poorly organized data, as the case may be).

If you think about your backup and recovery situation, the issue is likely complexity. Sure you want COPY and RECOVER utilities that run fast and consume minimal CPU, but the big issue is analysis. By that I mean, when you need to recover you want to make sure that you can use the image copies (and, of course, the log) to recover and meet your RTOs (recovery time objectives). But creating recover jobs on-the-fly, in a probably complicated environment with inter-related tables and data, can be difficult. And doing so when there is an outage, which is usually the case, exacerbates the situation. Using intelligent utilities to create the right image copies and to automatically build an appropriate recovery strategy when needed should be the modern approach.

And not to neglect RUNSTATS and CHECK, you want both of those utilities to run as fast as possible, consuming minimal CPU, too. And you want guidance on when and how to run them using available RTS, statistics, and any system information available.

What Can You Do?

One approach is to use modern utilities, not only built for speed but that incorporate AI and machine learning to automate and improve the Db2 utility experience. BMC Software is once again on the vanguard with its BMC AMI utilities for Db2.

The first question you probably have is "What the heck is AMI?" Well, AMI, which stands for Automated Mainframe Intelligence, is technology that is being infused into BMC’s product line to leverage AI, machine learning, and predictive analytics to achieve a self-managing mainframe.

BMC AMI Utilities for Db2 are designed for modern complex Db2 environments. They use a centralized, intelligent architecture (see diagram below) designed specifically to handle the complexity facing IT today. Through intelligent policy-driven automation, you can use the AMI Utilities for Db2 to manage growing amounts of data with ease and, at the same time, deliver full application availability.

Figure 1. BMC AMI Utilities for Db2

If you are looking to reduce CPU and elapsed time by as much as 75%, eliminate downtime while delivering full application availability, lower disk usage, eliminate sort in your REORGs, and simplify complex utility operations, then it makes sense to take a look at the BMC AMI Utilities for Db2.

----------

You might also want to take a look at this blog post from BMC that discusses how to Save Time and Money with Updated Unload Times

And this analysis of the BMC next generation REORG technology from Ptak Associates

Wednesday, June 20, 2018

Fast and Effective Db2 for z/OS Test Data Management with BCV5

Perhaps the most significant requirement for coding successful Db2 application programs is having a reasonable set of test data to use during the development process. Without data, there is no way to test your code to make sure it runs. But there are many issues that must be overcome in order to have a useful test data management process. Today we will talk about this along with a key enabling component of such a process, BCV5 from UBS Hainer.

One of the first things that organizations try is to make a copy of the production for testing. But this is easier said than done. You cannot just stop your production databases to make a copy of them for testing. But you still want a fast, consistent copy of the data. Consistent in terms of the units of work and referential integrity. And maybe you just want some of the data, not all of it. And we haven’t even talked about the potential regulatory concerns if you are copying personally identifiable information.

When you initially go to build your test data environment, the tools at your disposal are likely the utilities that came with Db2. This means that you will start with solutions like unloading and loading the data. But the LOAD and UNLOAD utilities are not known for their speed, so this can take a long time to accomplish – both for the initial creation and for any subsequent refreshing of the test data. This is important because test data must be refreshed on a regular basis as application testing is performed. Without the capability to refresh it is impossible to compare test runs and develop your programs consistently.

So, what should you do? Well, the first step is to create a consistent test bed either from scratch or, more likely, from production. And you want to do this efficiently and without interrupting production processing. This core bed of test data can be manipulated to reduce its size and even to satisfy regulatory requirements. With a core set of data you can then develop procedures to copy this data out to the various development and QA environments. To succeed, you need a fast method of populating multiple environments, on demand, from the approved test bed.

A key to achieving such an environment is an efficient Db2 data copying tool like BCV5, which can be used to copy and refresh Db2 data very rapidly. BCV5 copies Db2 table spaces and indexes within the same Db2 subsystem or even between different Db2 subsystems much faster than unloading and reloading the data. Using BCV5 you can deliver speedy copies because it works directly at the VSAM level. As BCV5 copies at the VSAM level it can replace Db2-internal OBIDs with the correct target values. This is significantly more efficient than unloading and loading one row at a time. And it takes away the complicated user-managed OBIDXLAT capability of DSN1COPY.

If you have used DSN1COPY in the past you know that it can be difficult to use; this is not the case with BCV5. With DSN1COPY you must specify a series of parameters that describe the input, such as the PIECESIZE, NUMPARTS, DSSIZE, whether it is a LOB table space or not, and more. BCV5 determines all required values automatically, making things a lot easier and less prone to failure.

And if you use LOB and XML data, and these days who doesn’t, BCV5 handles this data like any other, copying it at the same rate as regular table spaces.

BCV5 copies everything, not just the physical Db2 data, but also all of the associated structures including databases, table spaces, tables, indexes, and even views, triggers, aliases, synonyms, constraints, and so on! And you don’t need to worry if objects already exist; BCV5 will check for compatibility and keep the environment accurate. And all of the functionality you’d expect is there, such as the ability to rename objects between environments and to run the copy job either manually or via a job scheduler. Furthermore, you can interact with BCV5 using either an ISPF or a GUI interface.

Using BCV5, you can even use image copies as the source for your test data. BCV5 can use the most recent image copy, or an older image copy chosen by generation number, timestamp, or data set name pattern. BCV5 can automatically identify the correct image copy data sets and use them as the source for the data to be copied. You can even use BCV5 to refresh indexes using image copies of indexes if they exist.

Keeping Db2 statistics accurate can be another vexing test data issue. Generally speaking, you want to keep statistics up-to-date, but in test you probably want test statistics to mirror production. BCV5 can copy both RUNSTATS and RTS (Real Time Stats) directly from the source environment into the target. There is no need for a separate RUNSTATS job or to do a REORG in order to collect an RTS baseline.

And let’s not forget the most impressive aspect of BCV5, its speed and efficiency. BCV5 runs tasks in parallel with automatic workload balancing to further improve the performance of copying Db2 data. This efficiency comes in three forms: less CPU consumption, less elapsed run time, and a reduction in the management steps which can be automated instead of being done manually.

A case in point, a large automobile manufacturer uses BCV5 to manage its large Db2 test data environment consisting of over 11,000 table space partitions, another 11,000+ index partitions, and 20 LOBs. Before deploying BCV5 the company required hundreds of jobs that took almost 2 weeks to create, configure, and execute. After automating the process with BCV5, the entire process requires only 6 jobs that can refresh the test environments in 91 minutes. Impressive, no?

UBS Hainer markets other tools that augment and assist BCV5. For example, its In-Flight Copy add-on can enable BCV5 to get up-to-the-moment accurate data by gathering information from the Db2 log to make consistent copies of table spaces and indexes. It also offers a Reduction and Masking Data add-on to assist with enforcing privacy regulations in your test data. And BCV4 can be used to duplicate an entire Db2 subsystem.

The bottom line is that setting up test data can be difficult and time-consuming. Without a well-thought-out approach to gathering and refreshing test beds, application developers and quality assurance personnel will run into issues as they try to test Db2 code with corrupted or improper data. If your organization has issues with effectively managing test data for your Db2 for z/OS developers, take a look at UBS Hainer’s BCV5 solution for quickly copying and refreshing Db2 data.

Monday, September 18, 2017

The Db2 12 for z/OS Blog Series - Part 17: A New Privilege for UNLOAD

Db2 12 for z/OS introduces a new privilege that, when granted, enables a user to be able to unload data using the DB2 IBM UNLOAD utility. In past releases, the SELECT privilege (or other higher level admin privileges) was required to unload data using the UNLOAD utility. But this was less than desirable.

Why? Well, one reason is that it created a potential security gap. Consider the situation where a table has column masks or row permissions. In such as case, a user with SELECT privilege against the table still might not be able to access all of the rows and columns because of the masks/permissions that are defined. However, the same user with the same privilege set could execute the UNLOAD utility and be able to read all of the data in the table. Such as situation is not ideal and would not pass an audit.

To remove this gap IBM has introduced a new privilege, the UNLOAD privilege. After you move to Db2 12 for z/OS, SELECT authority is no longer enough to be able to unload data. In order to unload data the user must be granted the UNLOAD privilege on that table. The UNLOAD privilege can only be granted on a table; it cannot be granted on an auxiliary table or a view. The UNLOAD privilege is required after you have moved to function level V12R1M500 or higher.

Of course, there is a workaround if you still want to allow users with the SELECT privilege to be able to unload using the UNLOAD utility. This requires setting a DSNZPARM named AUTH_COMPATIBILITY to "SELECT_FOR_UNLOAD". The default for this DSNZPARM is NULL, which means that the UNLOAD privilege is required.

Regardless of the privilege, keep in mind that tables with multilevel security impose restrictions on the output of your UNLOAD jobs. A row will be unloaded only if the security label of the user dominates the security label of the row. So it is possible that an unload may not actually unload every row in the table. If security label of the user does not dominate the security label of the row, the row is not unloaded and DB2 does not issue an error message.