Wednesday, December 10, 2014

An Extra DBA Rule of Thumb

Last year on the blog I posted a series of 12 DBA Rules of Thumb. As a quick reminder, these Rules of Thumb - or ROTS, are some general rules of the road that apply to the management discipline of Database Administration that I have collected over the years. These ROTs are broadly applicable to all DBAs, even though this is a DB2-focused blog.

Please click on the link in the paragraph above if you need a refresher on the DBA ROTs from last year.

The purpose of today's blog post is to suggest an additional Rule of Thumb... and that is to Diversify!  A good DBA is a Jack-of-All-Trades. 


You can’t just master one thing and be successful in this day-and-age. The DBA maintains production, QA and test environments, monitors application development projects, attends strategy and design meetings, selects and evaluates new products, and connects legacy systems to the Web.
And if all of that is not enough, to add to the chaos, DBAs are expected to know everything about everything. From technical and business jargon to the latest management and technology fads and trends, the DBA is expected to be “in the know.” For example, the DBA must be up on trends like Big Data and Analytics.
And do not expect any private time: A DBA must be prepared for interruptions at any time to answer any type of question… and not just about databases, either.
When application problems occur, the database environment is frequently the first thing blamed. The database is “guilty until proven innocent.” And the DBA is expected to be there to fix things. That means the DBA is often forced to prove that the database is not the source of the problem. The DBA must know enough about all aspects of IT to track down errors and exonerate the DBMS and database structures he has designed. So he must be an expert in database technology, but also have semi-expert knowledge of the IT components with which the DBMS interacts: application programming languages, operating systems, network protocols and products, transaction processors, every type of computer hardware imaginable, and more. The need to understand such diverse elements makes the DBA a very valuable resource. It also makes the job interesting and challenging.
To summarize, the DBA must be a Jack-of-all-Trades... and a master of several!!!

Tuesday, December 02, 2014

DSN1COPY Improvements in DB2 11 for z/OS

There have been some nice data validation improvements made to the IBM DSN1COPY utility in DB2 11 for z/OS. I suppose I should first explain what the DSN1COPY utility is before I talk about how it has been improved, so...

DSN1COPY is also known as the "Offline Copy utility." It has many uses. Of course, the primary use case for DSN1COPY is to copy data sets without DB2 having to be up and running.  DSN1COPY can be used to copy VSAM data sets to sequential data sets, and vice versa. It also can copy VSAM data sets to other VSAM data sets and can copy sequential data sets to other sequential data sets. As such, DSN1COPY can be used to:
  • Create a sequential data set copy of a DB2 table space or index data set.
  • Create a sequential data set copy of another sequential data set copy produced by DSN1COPY.
  • Create a sequential data set copy of an image copy data set produced using the DB2 COPY utility, except for segmented table spaces. 
  • Restore a DB2 table space or index using a sequential data set produced by DSN1COPY.
  • Restore a DB2 table space using a full image copy data set produced using the DB2 COPY utility.
  • Move DB2 data sets from one disk to another.
  • Move a DB2 table space or index space from a smaller data set to a larger data set to eliminate extents. Or move a DB2 table space or index space from a larger data set to a smaller data set to eliminate wasted space.
DSN1COPY runs as a batch job, so it can run as an offline utility when the DB2 subsystem is inactive. It can run also when the DB2 subsystem is active, but the objects it operates on should be stopped to ensure that DSN1COPY creates valid output. DSN1COPY does not check to see whether an object is stopped before carrying out its task. DSN1COPY does not directly communicate with DB2.

DSN1COPY 
performs a page-by-page copy. Therefore, you cannot use DSN1COPY to alter the structure of DB2 data sets. For example, you cannot copy a partitioned table space into a segmented table space.

Perhaps the nicest feature of  DSN1COPY is its ability to modify the internal object identifier stored in DB2 table space and index data sets, as well as in data sets produced by DSN1COPY and the DB2 COPY utility. When you specify the OBIDXLAT option, DSN1COPY reads a data set specified by the SYSXLAT DD statement. This data set lists source and target DBIDs, PSIDs or ISOBIDs, and OBIDs, thereby enabling you to modify these IDs accordingly (possibly for moving data from one subsystem to another).

You can also use DSN1COPY to  check the validity of table space and index pages. 

OK Then, But What's New?

So now that we understand the DSN1COPY utility, let's dig in to learn a little bit about how it has been improved in DB2 11 for z/OS. Basically, DB2 11 bring improved data validation to the DSN1COPY utility.

In DB2 11, the target data set produce by DSN1COPY is automatically validated after it is populated. The first time that the target data set is physically opened by an operation other than a utility, DB2 checks for inconsistencies in the data and the DB2 Catalog. The validation performed includes checking: 
  • DBID, PSID, and OBID
  • SEGSIZE and PAGESIZE
  • Table space type
  • Table schema (if the table space contains only one table)

If inconsistencies are found, DB2 throws a -904 SQLCODE and reports the issue. You can then use the REPAIR utility to remediate the reported issues. In past releases, validation did not occur immediately, which could have resulted in data corruption issues, storage overlays, and even ABENDs.

Summary

So you can rest easier knowing that DSN1COPY data is checked after it is created, thereby removing a lot of the chance for calamity if you ran the utility improperly... and that's a good thing!

Monday, November 24, 2014

Peace, Prosperity and Happy Thanksgiving

Just a quick blog post today to wish all of my readers, wherever they live, peace and prosperity during this holiday season.

And to my readers in the USA, Happy Thanksgiving. You know what that means, right? Relatives, football, and a lot of food, including turkey!

It also means that we are about to "officially" start the holiday shopping season, which begins on Black Friday. Of course, anyone who has been out in the mall, or to any store really, knows that the holiday season started right after Halloween. But for most people it begins this week!

So, Happy Thanksgiving to the US folks... and peace, happiness and warm thoughts to all of you everywhere this week.

Friday, November 14, 2014

Database Basics for Beginners

Every now and then I get e-mail from readers of my blogs asking introductory/beginner questions about databases and DBMS. I cannot take the time to answer all of these e-mails in-depth, so I thought I blog a quick post with some good resources for folks.

I think a good place to start is an article I wrote several years ago now titled What is a Database? This article breaks down the benefits of a database, outlines the difference between a database and a DBMS, and provides some guidance for further reading (suggested books).

Other questions I am get asked frequently involve database administration. One is: What Does a DBA Do? Follow the link to find my answer to that question. Another is: how can I become a DBA? I wrote an article titled How to Become a DBA to answer that one. And finally, another frequent topic is: How many DBAs do I need? That is a tricky one, but I propose a framework to help answer that question in an article titled DBA Staffing Considerations.

I also get a lot of DB2 for z/OS questions. And I've written a book on that topic, plus a bunch of DB2 articles, too (all of which can be found here).

So I guess what I am saying here is to take a look at what is already "out there" to see if your questions can be answered on the web. But, please, keep the questions coming. If I do not answer your e-mail do not be discouraged. I do read most of them (unless it gets caught in my spam collector). Even if I do not have the time to respond, I keep track of what is asked and use it as input into my writing process... so you may see an answer pop up online in a blog, article, or column I write... eventually.

Tuesday, November 11, 2014

On Building Appropriate DB2 Indexes

Perhaps the single most important thing that can be done to assure optimal DB2 application performance is creating correct indexes for your tables based on the queries used by your applications. Of course, this is easier said than done. But we can start with some basics. For example, consider the following SQL statement:

  SELECT  LASTNAME, SALARY
  FROM    EMP
  WHERE   EMPNO = '000010'
  AND     DEPTNO =  'D01';

What index or indexes would make sense for this simple query? The short answer is “it depends.” But the more important answer is to understand what it depends upon! First, think about all of the possible indexes that could be created. Your first short list probably looks something like this:

  • Index1 on EMPNO
  • Index2 on DEPTNO
  • Index3 on EMPNO and DEPTNO
  • Index4 on DEPTNO and EMPNO

This is a good start and one of either Index3 or Index4 is probably the best. Either allows DB2 to use the index to immediately lookup the row or rows that satisfy the two simple predicates in the WHERE clause. Of course, if you already have a lot of indexes on the EMP table you might want to examine the impact of creating yet another index on the table. Factors to consider include:
  •  Modification impact: DB2 will automatically maintain every index that you create. This means that every INSERT and every DELETE to this table will cause data to be inserted and deleted not just from the table, but also from its indexes. And if you UPDATE the value of a column that is in an index, the index will also be updated. So, indexes speed the process of retrieval but slow down modification.
  • Importance of this particular query: The more important the query the more you may want to tune by index creation. For example, if you are coding a query that will be run every day by the CIO, you will want to make sure that it performs optimally. Who wants to risk a call from the CIO complaining about performance? So building indexes for that particular query is very important. On the other hand, a query for a low-level clerk may not necessarily be weighted as high, so that query may have to make due with the indexes that already exist. Of course, the decision depend on the importance of the application to the business – not just on the importance of the user of the application.
  • Columns in the existing indexes: If an index already exists on EMPNO or DEPTNO it might not be wise to create another index on the combination. However, it might make sense to change the other index to add the missing column. But not always because the order of the columns in the index can make a big difference depending on the query. For example, consider the following query:

        SELECT   LASTNAME, SALARY
        FROM     EMP
        WHERE    EMPNO = '000010'
        AND      DEPTNO >  'D01';

In this case, EMPNO should be listed first in the index. And DEPTNO should be listed second allowing DB2 to do a direct index lookup on the first column (EMPNO) and then a scan on the second (DEPTNO) for the greater-than. Furthermore, if indexes already exist for both columns (one for EMPNO and one for DEPTNO) DB2 potentially can use them both to satisfy this query so creating another index may not be necessary.

One final thought for today, and that is to build your indexes based on workload, not object by object. Many people make the mistake of just guessing as some indexes as they create tables for new projects. And sometimes this cannot be avoided because the SQL typically is not known before the database is created. But some of the guesses -- or maybe many of them -- are likely to be suboptimal at best, wrong at worst.

Indexes should be built to optimize access to data via your SQL queries. (Of course, there are indexes required to support RI and uniqueness, but let's leave them out of the discussion for the moment.) To properly create a set of indexes requires a list of the SQL to be used, an estimate of the amount of data in the table (and an estimate of column values if possible), an estimate of the frequency that each SQL statement will be executed, and the relative importance of each query. Only then can the delicate balancing act of creating the right indexes to optimize the right queries (most of the time) be attempted. If you are doing it any other way, you are doing it wrong.

Of course, there is much more to index design than we have covered so far. For example, you might consider index overloading to achieve index only access. If all of the data that a SQL query asks for is contained in the index, DB2 may be able to satisfy the request using only the index. Consider our previous SQL statement. We asked for LASTNAME and SALARY given information about EMPNO and DEPTNO.  And we also started by creating an index on the EMPNO and DEPTNO columns. If we include LASTNAME and SALARY in the index as well then we never need to access the EMP table because all of the data we need exists in the index. This technique can significantly improve performance because it cuts down on the number of I/O requests.

Keep in mind, though, that it is not prudent (or even possible) to make every query an index only access. This technique should be saved for particularly troublesome or important SQL statements.