Monday, December 13, 2010

More Indicator Variables Available in DB2 10 for z/OS

As you all should know by now, version 10 of DB2 doe z/OS is generally available and has been for a month or so now. As such, it is probably time that I start to blog about some of the features of the new release. But instead of starting with one of the bigger features, that you already may have heard about, I decided to start with a feature that has flown somewhat under the radar: extended indicator variables.


Those of you who write programs that deal with possibly null results should know what an indicator variable is. Basically, DB2 represents null in a special variable known as an indicator. An indicator is defined to DB2 for each column that can accept nulls. The indicator variable is transparent to the end user, but must be provided for when programming in a host language (such as COBOL or PL/I). If the indicator variable is less than zero, then the column to which it applies has returned NULL.


DB2 10 for z/OS enhances and extends the concept of an indicator variable so they can be used outside the scope of nullability. Consider the scenario where a program is being written to modify data. There are multiple combinations of columns that may need to be modified based on conditional programmatic processing. Maybe the program is for editing customer data. The customer has multiple columns that could be modified: name, address, telephone number, credit rating, etc. So the programmer codes up an online screen (e.g. CICS) with all of the data that can then be changed by the end user.


But what happens when the end user cracks the enter key to submit the changes? What actually changed and what stayed the same? Does the program check every value on the screen (perhaps hundreds) and build every UPDATE statement iteration for data that might have been changed? Unlikely, since that would require x! statements (where x is the total number of columns). For non-mathematicians a discussion of factorial can be found here (http://en.wikipedia.org/wiki/Factorial).


Yes, there are CICS options to help the programmer determine which values have changed (or simply save and compare). But until now, dealing with all the potential SQL statements could be problematic. Well, DB2 10 indicator variables come to the rescue. As of DB2 10 NFM you can use indicator variables to inform DB2 whether the value for an associated host variable has been supplied or not… and to specify how DB2 should handle the missing value.


This is an extended indicator variable. And it can be applied to host variables and parameter markers. Whether you will use extended indicator variables can be enabled at the package level, by using the EXTENDEDINDICATOR option of the BIND PACKAGE command. You can also enable extended indicator variables on a statement level for dynamic SQL by using the WITH EXTENDED INDICATORS attribute on the PREPARE statement.


How would this work? Well, extended indicator variables can be specified only for host variables that appear in the following situations:

  • The set assignment list of an UPDATE operation in UPDATE or MERGE statements
  • The values list of an INSERT operation in INSERT or MERGE statements
  • The select list of an INSERT statement in the FROM clause
OK, then, how would we use an extended indicator variable? By setting its value to tell DB2 how to proceeed. The following values are available:

  • 0 (zero) or a positive integer: This indicates the first host identifier provides the value of this host variable reference and it is not null.
  • -1, -2, -3, -4, or -6: This indicates a null.
  • -5: If extended indicator variables are not enabled, this indicates a null; otherwise, a value of -5 indicates that the DEFAULT value is to be used for the target column for this host variable.
  • -7: If extended indicator variables are not enabled, this indicates a null; otherwise, a value of -7 indicates that the UNASSIGNED value is to be used for the target column for this host variable (in other words, treat it as if it were not specified in this statement).


For an INSERT, -5 and -7 settings for an extended indicator variable will end up with the same result. This is so because the INSERT statement works by inserting a default value for any column that is missing. On the other hand, for UPDATE and the UPDATE portion of a MERGE, setting the extended indicator variable to -5 leads to the column being update to the default value, but -7 leads to the update of the column not being applied.


With extended indicator variables then, there is no need for the application to re-send a column’s current value, or to know a column’s DEFAULT value. Which should make things easier for developers.

Thursday, October 28, 2010

IBM Information On Demand 2010 - The Final Keynote

The keynote session for the third day of the IOD conference features the authors of Freakonomics, Steven Levitt and Stephen Dubner. I've read their first book and it is an excellent read... I highly recommend it.

But, of course, there are the IBMers that must speak first. The session kicked off with a video on intelligence being infused into the devices we use in our everyday life. And this “smarter planer” improves our life in countless ways. Smart grids, smart healthcare, smart supply chains, etc. All of which make us more productive and effective not just in business, but in all aspects of our lives. IBM calls this the “Decade of Smart.”

The first part of the session then featured Mike Rhodin, Sr. VP IBM Software Solutions Group. He indicated that we are at the beginning of what is going to constitute massive changes to the way we look at and solve problems. He explained by talking about solutions for commerce that have changed over the last decade or so. The experience is vastly different today across the board. This is so in terms of how buying decisions are made, how buying is done, and how the transaction is completed.

But how can we know what the customer of the future wants? The idea now is to look at how you can leverage things like social media to perform “sentiment analysis” to engage in conversation. By making it a dialogue instead of a one way street we can start this transformation.

He talked about a Southwest flier who was dissatisfied by a delay and tweeted about it. A few days later a Southwest representative contacted him and offered him “something” to assuage his dissatisfaction. Although that is good, he said it would have been better if the Southwest rep was waiting for him at the gate at his destination. OK, but in my opinion, it would have been even better if the Southwest rep could have made contact before the plane took off (either on the plane or at the gate if they had not yet boarded).

Next up was Brenda Dietrich, VP Business Analytics and Mathematical Sciences and IBM Research. It was good to hear from someone in the research group because they don't get "out" to speak much. Dietrich espoused the global reach of IBM’s research group with 9 major offices across the world and many more co-laboratories, which are smaller labs with the goal of working with more local talent.

If it has to do with the future of technology, IBM Research is probably involved in it. Examples include nanotechnology, supercomputing and workload-optimized systems, cloud computing, and analytics.

The future of the “smarter planet” is at optimizing individual systems, like an electrical grid. And then developing systems of systems where those individual systems interact with other systems. For example, where the electrical grid interacts with the traffic grid. Additionally, today things are being digitized and we are analyzing and reacting to this information. We are moving toward using this information to model and predict outcomes.

She also discussed an analytics project called Smart Enterprise Executive Decision Support (SEEDS). It is a super-dashboard that IBM Research is working on. It incorporated a common data model with multiple IBM technologies to perform analytics that delivers better answers. Sounds exciting to me!

And IBM Research has even created a computer that plays Jeopardy! The video she played that demonstrated that was very impressive. This is especially so because it understood the questions in natural language, which is very difficult for computers to accomplish.

Then the Freakonomics dudes came out. And they were very entertaining. They took turns telling stories. Levitt is the economist and Dubner is the writer, but both were eloquent speakers who mixed information with humor extremely well. Dubner started out with my favorite story from their first book: how the legalization of abortion led to a decrease in crime. If that surprises you, you really need to read the book(s).

Levitt followed and told the story of how adding social security number to the tax forms caused 7 million children to vanish from the face of the Earth. It turns out that Americans are very immoral and had created children for the tax deduction. Levitt had troubles believing this until he talked to his father and was told that he himself had lost two brothers!

I would try to explain how they then moved from trying to teach monkeys to use money to a discussion of the proper pricing for prostitution services... but it would be far better if you read about it in their book(s). I know after seeing them that I am going to buy their new book, SuperFreakonomics.

All in all, though, it was a thoroughly entertaining and education final keynote session at the IOD show.

Tuesday, October 26, 2010

A Video from IOD, DB2 -- Monday 10/25

This video was shot by Rebecca Bond at the IBM Information On Demand Conference 2010 in Las Vegas. She was interviewing DB2 folks on what they do and the benefit they get from attending IOD. I am the third interview... but don't just skip to me! Listen to Melanie and Fred, too!

News From The IOD Conference

As usual, IBM has put out a number of press releases in conjunction with the Information On Demand conference, and I will use today’s blog to summarize some of the highlights of these releases.

First of all, IBM is rightly proud of the fact that more than 700 SAP clients have turned to IBM DB2 database software to manage heavy database workloads for improved performance… and, according to IBM, at a lower cost. By that they mean at a lower cost than Oracle. Even though the press release does not state that these SAP sites chose DB2 over Oracle, the IBM executive I spoke with yesterday made it clear that that was indeed the case.

This stampede of SAP customers over to DB2 should not be a surprise because DB2 is SAP’s preferred database software. This might be surprising given that SAP recently acquired Sybase, but IBM notes that seven Sybase runs SAP on DB2.

The press release goes on to call out several customers who are using DB2 with SAP and their reasons for doing so. For example, Reliance Life chose DB2 for the better transaction performance and faster access to data it delivered. Banco de Brasil, on the other hand, was looking to reduce power consumption and storage by consolidating its database management systems.

IBM also announced new software that helps clients automate content-centric processes and manage unstructured content. The highlight of this announcement is IBM Case Manager, software that integrates content and process management with advanced analytics, business rules, collaboration and social software.

IBM also enhanced its content analytics software. NTT DOCOMO of Japan is impressed with IBM’s offering. “With Content Analytics, we have an integrated view of all information that’s relevant to out business in one place regardless of where it’s stored,” said Makoto Ichise, Manager of Information Systems Department Group at NTT DOCOMO.

IBM also enhanced its Information Governance solutions and announced further usage of it InfoSphere Streams product for analyzing medical data to improve healthcare.

So IBM software keeps improving and helping us to better manage our data in a constantly changing world…

Monday, October 25, 2010

DB2 10 Technical Overview at IOD Conference

Today I attended the IOD conference and had the opportunity to listen to Jeff Josten present an technical overview of DB2 10 for z/OS. Even though information and specifications have been trickling out on DB2 10 for z/OS over the course of the past year or so, this is the first DB2 10 presentation I have attended subsequent to the GA announcement IBM made last week (announced October 19th, General Availability on October 22nd, 2010). So I’m fairly certain that everything Jeff will talk about will be officially part of DB2 10, instead of just the rumors, hints and allegations proffered prior to the GA announcement. I don’t think anything to be covered will surprise me, but we’ll see.

Jeff started off saying that the technical strategy for DB2 10 was four-pronged:

  • Continuous availability
  • Performance and scalability
  • Ease of management
  • Advance application features

  • One of the key value propositions is the deep synergy with the System z hardware. Because the code does not need to be ported all over the place, it can take advantage of the hardware capabilities that bring improved performance and efficiency.

    DB2 10 is being promoted as delivering 5 to 10 percent CPU batch and transaction performance improvement out-of-the-box; 20 percent for new workloads. And then an additional 10 percent when you start using new features. This seems to be the high-level talking point for DB2 10 – but that is okay, it is a very good one!

    DB2 V8 will go out of service April 2012. But with the ability to go staright from V8 to 10, I’m betting a lot of V8 shops will skip 9 altogether and go right to 10. But it looks like you have just under 2 years to get off of V8, though.

    DB2 10 takes advantage of many zEnterprise (the new mainframe announced a couple of months ago) features to deliver scalability. Examples include improved compression, cache optimization, blades for running the Smart Analytics Optimizer, etc.

    Jeff mentioned that we’ll be able to support 10 times more users by avoiding memory constraints in DB2 10. That is a big scalability improvement!

    DB2 10 went through the largest beta ever: 23 customers and more than 80 vendors. The focus was on testing production level workloads to ensure that the release is stable. That is different than past betas where the focus was on testing new features. And 22 of the beta customers are planning to go into production with DB2 10 next year. Impressive!

    Before diving into the technical details, Jeff mentioned that not much has changed versus what IBM has been talking about over the past months. A couple late add features include hash performance, BIND performance, REBIND not required for packages flagged as private protocol (but they will fail if they actually use private protocol), and a new ZPARM for default SEGSIZE for DDL compatibility.

    Post GA delivery items include APREUSE and APCOMPARE (for reusing access paths instead of using “hints”) because beta testing exposed some quality items, the ability to delete a data sharing member, inline LOBs for SPT01, online REORG concurrency for materializing deferred ALTERs, and some temporal enhancements (e.g. TIMESTAMP WITH TIMEZONE support).

    High performance DBATs (DDF threads) were forced to be RELEASE COMMIT. This gave good storage usage but at the expense of CPU for releasing resources at COMMIT and putting the thread back on the queue. For DB2 10, customers can use RELEASE DEALLOCATE which will keep the thread assigned to the distributed requestor so the next time he comes in he can reuse the thread. This is a nice feature for shops with heavy distributed usage.

    Another nice thing right out of the box is parallel index updating at INSERT. This used to be synchronous, each index modified one after the other. Now, the indexes can be modified in parallel, which should be a nice performance improvement for many shops!

    There is also a new buffer pool option for a “fully in memory” object for reading the data and pinning it in buffers. Don’t know about you, but I’ve been wanting that for years.

    Things that require a REBIND will include most access path enhancements, query parallelism improvements, IN list performance improvements, and Stage 2 predicates being pushed to Stage 1.

    Things that require NFM include DB2 Catalog concurrency, compress on insert capability, most utility enhancements, LOB streaming between DDF and the rest of DB2, INSERT improvements for universal table spaces, faster FETCH and INSERT with lower virtual storage consumption, SQL Procedure Language performance improvements, efficient caching of dynamic SQL with literals, as well as a few other “things.”

    And then there are things that require NFM and DBA work, such as hashing, index include columns, inline LOBs, DEFINNE NO for LOB and XML columns, MEMBER CLUSTER for universal table spaces, and online REORG for all DB2 Catalog and Directory table spaces.

    So it looks like our favorite DBMS is continuing to grow and expand offering high performance functionality and features that are unparalleled in the industry. Indeed, there is a lot of great and exciting new “stuff” on the way in DB2 10.

    Thursday, October 07, 2010

    Null Follow-up: IS [NOT] DISTINCT FROM

    After publishing the last blog post here on the topic of pesky problems that crop up when dealing with nulls, I received a comment lamenting that I did not address the IS [NOT] DISTINCT FROM clause. So today’s blog post will redress that failure.

    First of all, IS [NOT] DISTINCT FROM is a relatively new predicate operator, introduced to DB2 for z/OS in Version 8. It is quite convenient to use in situations where you are looking to compare to columns that could contain NULL.

    Before diving into the operator, let’s first discuss the problem it helps to solve. Two columns are not equal if both are NULL, that is because NULL is unknown and a NULL never equals anything else, not even another NULL. But sometimes you might want to treat NULLs as equivalent. In order to do that, you would have to code something like this in your WHERE clause:

    WHERE COL1 = COL2
    OR (COL1 IS NULL AND COL2 IS NULL)

    This coding would cause DB2 to return all the rows where COL1 and COL2 are the same value, as well as all the rows where both COL1 and COL2 are NULL, effectively treating NULLs as equivalent. But this coding although relatively simply, can be unwieldy and perhaps, at least not at first blush, unintuitive.

    Here comes the IS NOT DISTINCT FROM clause to the rescue. As of DB2 V8, the following clause is logically equivalent to the one above, but perhaps simpler to code and understand:

    WHERE COL1 IS NOT DISTINCT FROM COL2

    The same goes for checking a column against a host variable. You might try to code a clause specifying WHERE COL = :HV :hvind (host variable and indicator variable). But such a search condition would never be true when the value in that host variable is null, even if the host variable contains a null indicator. This is because one null does not equal another null - ever. Instead we’d have to code additional predicates: one to handle the non-null values and two others to ensure both COL1 and the :HV are both null. With the introduction of the IS NOT DISTINCT FROM predicate, the search condition could be simplified to just:

    WHERE COL1 IS NOT DISTINCT FROM :HV :hvind

    Wednesday, October 06, 2010

    Null Troubles

    A null represents missing or unknown information at the column level. A null is not the same as 0 (zero) or blank. Null means no entry has been made for the column and it implies that the value is either unknown or not applicable.

    DB2 supports null, and as such you can use null to can distinguish between a deliberate entry of 0 (for numerical columns) or a blank (for character columns) and an unknown or inapplicable entry (NULL for both numerical and character columns).

    Nulls sometimes are inappropriately referred to as “null values.” Using the term value to describe a null is inaccurate because a null implies the lack of a value. Therefore, simply use the term null or nulls (without appending the term “value” or “values” to it).

    DB2 represents null in a special “hidden” column known as an indicator. An indicator is defined to DB2 for each column that can accept nulls. The indicator variable is transparent to the end user, but must be provided for when programming in a host language (such as COBOL or PL/I).

    Every column defined to a DB2 table must be designated as either allowing or disallowing nulls. A column is defined as nullable – meaning it can be set to NULL – in the table creation DDL. Null is the default if nothing is specified after the column name. To prohibit the column from being set to NULL you must explicitly specify NOT NULL after the column name. In the following sample table, COL1 and COL3 can be set to null, but not COL2, COL4, or COL5:

    CREATE TABLE SAMPLE1
    (COL1 INTEGER,
    COL2 CHAR(10) NOT NULL,
    COL3 CHAR(5),
    COL4 DATE NOT NULL WITH DEFAULT,
    COL5 TIME NOT NULL);

    What Are The Issues with Null?

    The way in which nulls are processed usually is not intuitive to folks used to yes/no, on/off, thinking. With null data, answers are not true/false, but true/false/unknown. Remember, a null is not known. So when a null participates in a mathematical expression, the result is always null. That means that the answer to each of the following is NULL:
    • 5 + NULL
    • NULL / 501324
    • 102 – NULL
    • 51235 * NULL
    • NULL**3
    • NULL + NULL
    • NULL/0
    Yes, even that last one is null, even though the mathematician in us wants to say “error” because of division by zero. So nulls can be tricky to deal with.

    Another interesting aspect of nulls is that the AVG, COUNT DISTINCT, SUM, MAX, and MIN functions omit column occurrences set to null. The COUNT(*) function, however, does not omit columns set to null because it operates on rows. Thus, AVG is not equal to SUM/COUNT(*) when the average is being computed for a column that can contain nulls. To clarify with an example, if the COMM column is nullable, the result of the following query:

    SELECT AVG(COMM)
    FROM DSN8810.EMP;

    is not the same as for this query:

    SELECT SUM(COMM)/COUNT(*)
    FROM DSN8810.EMP;

    But perhaps the more troubling aspect of this treatment of nulls is “What exactly do the results mean?” Shouldn’t a function that processes any NULLs at all return an answer of NULL, or unknown? Does skipping all columns that are NULL return a useful result? I think what is really needed is an option for these functions when they operate on nullable columns. Perhaps a switch that would allow three different modes of operation:
    1. Return a NULL if any columns were null, which would be the default
    2. Operate as it currently does, ignoring NULLs
    3. Treat all NULLs as zeroes
    At least that way users would have an option as to how NULLs are treated by functions. But this is not the case, so to avoid confusion, try to avoid allowing nulls in columns that must be processed using these functions whenever possible.

    Here are some additional considerations regarding the rules of operation for nulls:

    • When a nullable column participates in an ORDER BY or GROUP BY clause, the returned nulls are grouped at the high end of the sort order.
    • Nulls are considered to be equal when duplicates are eliminated by SELECT DISTINCT or COUNT (DISTINCT column).
    • A unique index considers nulls to be equivalent and disallows duplicate entries because of the existence of nulls, unless the WHERE NOT NULL clause is specified in the index.
    • For comparison in a SELECT statement, two null columns are not considered equal. When a nullable column participates in a predicate in the WHERE or HAVING clause, the nulls that are encountered cause the comparison to evaluate to UNKNOWN.
    • When a nullable column participates in a calculation, the result is null.
    • Columns that participate in a primary key cannot be null.
    • To test for the existence of nulls, use the special predicate IS NULL in the WHERE clause of the SELECT statement. You cannot simply state WHERE column = NULL. You must state WHERE column IS NULL.
    • It is invalid to test if a column is <> NULL, or >= NULL. These are all meaningless because null is the absence of a value.
    Examine these rules closely. ORDER BY, GROUP BY, DISTINCT, and unique indexes consider nulls to be equal and handle them accordingly. The SELECT statement, however, deems that the comparison of null columns is not equivalence, but unknown. This inconsistent handling of nulls is an anomaly that you must remember when using nulls.

    Here are a couple of other issues to consider when nulls are involved.

    Did you know it is possible to write SQL that returns a NULL even if you have no nullable columns in your database? Assume that there are no nullable columns in the EMP table (including SALARY) and then consider the following SQL:

    SELECT SUM(SALARY)
    FROM EMP
    WHERE DEPTNO > 999;

    The result of this query will be NULL if no DEPTNO exists that is greater than 999. So it is not feasible to try to design your way out of having to understand nulls!

    Another troubling issue with NULLs is that some developers have incorrect expectations when using the NOT IN predicate with NULLs. Consider the following SQL:

    SELECT C.color
    FROM Colors AS C
    WHERE C.color NOT IN (SELECT P.color
    FROM Products AS P);

    If one of the products has its color set to NULL, then the result of the SELECT is the empty set, even if there are colors to which no other product is set.

    Summary

    Nulls are clearly one of the most misunderstood features of DB2 – indeed, of most SQL database systems. Although nulls can be confusing, you cannot bury your head in the sand and ignore nulls if you choose to use DB2 as your DBMS. Understanding what nulls are, and how best to use them, can help you to create usable DB2 databases and design useful and correct queries in your DB2 applications.

    Friday, September 24, 2010

    A Recommended New DB2 Book

    Judy Nall has performed a much-needed service for the DB2 for z/OS community by writing her new book, DB2 9 System Administration for z/OS: Certification Study Guide. There are many DB2 for z/OS books (heck, I wrote one myself) that cover programming, performance, and database administration details. But never before has there been one that focused on system administration and system programming.

    Of course, the book is targeted at those looking to become an IBM Certified System Administrator for DB2 for z/OS. I have never taken the exams required for that certification, but the material in this book will go a long way toward making you a better system programmer for a mainframe DB2 environment.

    Whereas some of the material can be found in greater detail in other books on the market, we must keep in mind that target market for the book. And the coverage of DB2 fundamentals and performance is well-written and hits the mark for systems folks. And the chapters on installation and migration, system backup and recovery, and systems operation and troubleshooting offer great systems-level knowledge not found in other DB2 for z/OS books.

    So while DB2 9 System Administration for z/OS: Certification Study Guide is not for everyone, the people that it is for (systems programmers and systems DBAs) should enjoy it and benefit from the nice job Judy has done organizing and explaining the details of system administration for DB2 for z/OS.

    Thursday, August 26, 2010

    Free DB2 Education Webinar Series

    Want to learn more about DB2 for z/OS but there is no money in the education budget? Can you spare an hour a week over the course of a month? Well then, you are in luck because SoftwareOnZ is sponsoring a series of DB2 webinars presented by yours truly, Craig S. Mullins

    Each webinar will be focused on a specific DB2 topic so you can pick and choose the ones that are most interesting to you – or attend them all and receive a certificate signed by me indicating that you have completed The DB2 Education Webinar Series.

    The schedule and topics for these sessions follows:

    September 28, 2010 – DB2 Access Paths: Surviving and Thriving

    Binding your DB2 programs creates access paths that dictate how your applications will access DB2 data. But it can be tricky to understand exactly what is going on. There are many options and it can be difficult to select the proper ones… and to control when changes need to be made.

    This presentation will clarify the BIND process, enabling you to manage DB2 application performance by controlling your DB2 access paths. And it will introduce a new, GUI-based product for managing when your programs need to be rebound.

    October 5, 2010 – Optimizing DB2 Database Administration

    DB2 DBAs are tasked with working in a complex technological environment, and as such, the DBA has to know many things about many things. This makes for busy days. How often have you asked yourself, “Where does the time go?”

    Well, the more operational duties that can be automated and streamlined, the more effective a DBA can be. This presentation will address issues that every DB2 Database Administrator and/or DB2 Systems Programmer faces on a daily basis. And it will introduce a new tool, DB-Genie, that will reduce the amount of time, effort, and human error involved in maintaining DB2 databases.

    October 12, 2010 – DB2 Storage: Don’t Ignore the Details!

    For many DB2 professionals, storage management can be an afterthought. What with designing, building, and maintaining databases, assuring recoverability, monitoring performance, and so on, keeping track of where and how your databases are stored is not top of mind. But a storage problem can bring your databases and applications to a grinding halt, so it is not wise to ignore your storage needs.

    This presentation will discuss the important storage-related details regarding DB2 for z/OS, including some of the newer storage options at your disposal. And we will also introduce a new web-based tool for monitoring all of your mainframe DB2 storage.

    October 19, 2010 – The DB2 Application Developer’s Aid de Camp

    Building DB2 application programs is a thankless job. And it can be difficult to ensure that you have a effective and efficient development environment for coding DB2 applications. Can you easily identify which tables are related to which… and what indexes are available so you code queries the right way the first time? Do you have the right data to test your programs? Can you make quick and dirty changes to just a few tables or rows without having to write yet another program?

    This presentation will discuss the issues and difficulties that developers encounter on a daily basis as they build DB2 applications… and it will present a useful programmer-focused toolset for overcoming these difficulties.

    Summary

    Certainly there will be something of interest for every DB2 professional in at least one, if not all, of these complimentary web-based seminars.

    So what’s stopping you? Sign up today!

    Thursday, August 05, 2010

    DB2 Best Practices

    With today's blog entry I'm hoping to encourage some open-ended dialogue on best practices for DB2 database administration. Give the following questions some thought and if you've got something to share, post a comment!

    What are the things that you do, or want to do, on a daily basis to manage your database infrastructure?

    What things have you found to be most helpful to automate in administering your databases? Yes, I know that all the DBMS vendors are saying that they've created the "on demand" "lights-out" "24/7" database environment, but we all know that ain't so! So what have you done to automate (either using DBMS features, tools, or homegrown scripts) to keep an eye on things?

    How have you ensured the recovery of your databases in the case of problems? Application problems? Against improper data entry or bad transactions? Disaster situations? And have you tested your disaster recovery plans? If so, how? And were they successful?

    What type of auditing is done on your databases to track who has done what to what data? Do you audit all changes? To all applications, or just certain ones? Do you audit access, as well as modification? If so how?

    How do you manage change? Do you use a change management tool or do it all by hand? Are database schema changes integrated with application changes? If so, how? If not, how do you coordinate things to keep the application synchronized with the databases?

    What about DB2 storage management? Do you actively monitor disk usage of your DB2 table space and index spaces? Do you have alerts set so that you are notified if any object is nearing its maximum size? How about your VSAM data sets? Do you monitor extents and periodically consolidate? How do you do it... ALTER/REORG? Defrag utilities? Proactive defrag?

    Is your performance management set up with triggers and farmed out to DBAs by exception or is it all reactive, with tuning tasks being done based on who complains the loudest?

    Do you EXPLAIN every SQL statement before it goes into production? Does someone review the acess plans or are they just there to be reviewed in case of production performance problems? Do you rebind your programs periodically (for static SQL programs) as your data volume and statistics change, or do you just leave things alone until (or unless) someone complains?

    When do you reorganize your data structures? On a pre-scheduled regular basis or based on database statistics? Or a combination of both? And how do you determine which are done using which method? What about your other DB2 utilities? Have you automated their scheduling or do you still manually build JCL?

    How do you handle PTFs? Do you know which have been applied and which have not? And what impact that may be having on your database environment and applications? Do you have a standard for how often PTFs are applied?

    How is security managed? Do the DBAs do all of the GRANTs and REVOKEs or is that job shared by security administrators? Are database logons coordinated across different DBMSs? Or could I have an operating system userid that is different from my SQL Server logon that is different than my Oracle logon -- with no capability of identifying that the user is the same user across the platforms?

    How has regulatory compliance (e.g. PCI DSS, SOX, etc.) impacted your database administration activities? Have you had to purchase additional software to ensure compliance? How is compliance policed at your organization?

    Just curious... Hope I get some responses!

    Friday, July 30, 2010

    Happy SYSADMIN Day

    To all of the system administrators out there (and I include DBAs and network admins in that group), HAPPY SYSADMIN DAY.

    For those who are unaware if this very important holiday, every year on the last Friday of July responsible people everywhere celebrate Sysadmin Day. The idea is to show some appreciation for the folks who keep your systems up and running every day of the week. There is even a page with some gift ideas if you are so inclined to get something for your favorite sysadmin. (Personally, I prefer cash... .)

    At the very least you can wish him/her a Happy Sysadmin Day today... and hoist a beer or two in his/her honor at the pub this evening...

    Tuesday, July 13, 2010

    Classics of Computer Literature

    Although the main focus of this blog is DB2 and mainframe software, I thought it would be worthwhile to take some time to recommend a few classic books for computer professionals. I am an avid reader of all kinds of books, not only on technology but on a wide variety of topics. Periodically I will use my blog to extol the virtues of some of my favorite books.


    I'm starting with computer books as everyone reading this is probably in the field of IT. (...except maybe my Mom, hi Mom!) These books are not DBMS- or data-focused: I will recommend data and database books later, in some future blog posting.

    So, here goes, my coverage of a nice starter set of 4 computer books that everyone should read...
















    Every computer professional should own a copy of Frederick P. Brooks Jr.’s seminal work, The Mythical Man-Month (Addison-Wesley Pub Co; ISBN: 0201835959). Brooks is best known as the father of the IBM System/360, the quintessential mainframe computer. He managed the projects that created the S/360 and its operating system.

    This book contains a wealth of knowledge about software project management including the now common-sense notion that adding manpower to a late software project just makes it later. The 20th anniversary edition of The Mythical Man-Month, published in 1995, contains a reprint of Brooks’ famous article “No Silver Bullet” as well as Brooks’ reflections on the twenty years since the book’s publication. If creating software is your discipline, you absolutely need to read and understand the tenets in this book.



    Another essential book for technologists is Peopleware (Dorset House; ISBN: 0932633439) by Tom DeMarco and Timothy Lister. This book concentrates on the human aspect of project management and teams. If you believe that success is driven by technology more so than people, this book will change your misconceptions. Even though this book was written in the late 1980’s, it is still very pertinent to today’s software development projects.

    DeMarco is the author of several other revolutionary texts such as Structured Analysis and Design (Yourdon Press; ISBN: 0138543801). This book almost single-handedly introduced the concept of structured design into the computer programming lexicon. Today, structured analysis and design is almost completely taken for granted as the best way to approach the development of application programs.



    If you are a systems analyst, application programmer, or software engineer then you will surely want Donald Knuth’s three volume series The Art of Computer Programming (Addison-Wesley Pub Co; ISBN: 0201485419). This multi-volume reference is certainly the definitive work on programming techniques.

    Knuth covers the algorithmic gamut in this three volume set, with the first volume devoted to fundamental algorithms (like trees and linked lists), a second volume devoted to semi-numerical algorithms (e.g. dealing with polynomials and primes), and a final volume dealing with sorting and searching. Even though a comprehensive reading and understanding of this entire set can be foreboding, all good programmers should have these techniques at their disposal.

    OK, I know, this is sort of cheating because it is a 3 book set, but so what... my blog... my rules!



    Finally, I’d like to recommend a good book on the history of computing. The old maxim still stands: "Those who do not know history are doomed to repeat it." But most computer specialists are only dimly aware of the rich history of their chosen field.

    There are quite a few books available on computing hsitory and most provide coverage of the basics. A current favorite though, is The Universal History of Computing: From the Abacus to the Quantum Computer by Georges Ifrah. The book offers a comprehensive journey through the history of computing. Particularly interesting is the chronological summary offered up in Chapter 1. It starts out in 35000 BCE - the era from which we have discovered the first notched bones that were probably used for counting, and progresses into the modern era of computing.

    This book spans the complete history of information processing providing useful insight into the rise of the computer.


    Now I don’t pretend to believe that these are the only classic books in IT literature, but I do know that they will provide a good, solid core foundation for your IT library. Books promote knowledge better than any other method at our disposal. And knowledge helps us do our jobs better. So close down that web connection and pick up a book. You’ll be glad you did.

    Tuesday, June 22, 2010

    Access Your DB2 Catalog "Poster" Online

    If you're anything like me, you're constantly looking for DB2 Catalog table and column names. For writing catalog queries, for examining statistics, for looking at your table and tablespace parameters, for many, many things. But it is not very easy to keep reaching for the DB2 manuals (Which manual is it in? Which appendix was that? Why did they put it there? Did they move it?)...

    So, many of us gladly tacked up those posters from Platinum Technology that graphically depicted the DB2 Catalog... and then later similar posters from CA, Inc, and BMC Software... but those posters have grown is size (as has the DB2 Catalog)... and the posters have become less useful over the years because they contain less information and the smaller type.

    Well, here comes a solution from zSystems and SoftwareOnZ: a free and very easy to use online DB2 Catalog reference application.

    If you can use a web browser and a mouse then you can find the DB2 Catalog information you desire. Simply point and click on the appropriate table and you'll get its definition along with a listing of its columns and their data type and length. And you'll also get information about which columns participate in any indexes and what type of index (primary, unique, duplicate index).

    Not only that, there is both a V8 and V9 edition of the DB2 Catalog so you can easily toggle back and forth between the two versions. That is very handy for sites that have some V8 subsystems and some V9 subsystems!

    So if you are looking for a better way to view your DB2 Catalog information, be sure to check out the online DB2 Catalog reference from zSystems.

    Thursday, May 13, 2010

    IDUG NA 2010, Days Two and Three

    I’ve been running around kinda busy the past couple of days here at IDUG in Tampa, so I got a bit behind in blogging about the conference. So, today I’m combining two days of thoughts into one blog post.

    (For a summary of IDUG Day One, click here.)

    I started off day two by attending Brent Gross’ presentation on extracting the most value from .NET and ODBC applications. Brent discussed some of the things to be aware of when developing with .NET, an important “thing” being awareness that .NET is designed to work in a disconnected data architecture. So applications will not go through data a row at a time but instead send the data to the application and let it process it there. As an old mainframe DBA that caused alarm bells to ring.

    I also got the opportunity to hear Dave Beulke discuss Java DB2 developer performance best practices. Dave delivered a lot of quality information, including the importance of developing quality code because Java developers reuse code – and you don’t want bad code being reused everywhere, right?

    Dave started out mentioning how Java programmer are usually very young and do not have a lot of database experience. So DBAs need to get some Java knowledge and work closely with Java developers to ensure proper development. He also emphasized the importance of understanding the object to relational mapping method.

    From a performance perspective Dave noted the importance of understanding the distributed calls (how many, where located, and bandwidth issues), controlling commit scope, and making sure your servers have sufficient memory. He also indicated that it is important to be able to track how many times Java programs connect to the database. He suggested using a server connection pool and to be sure that threads are always timed out after a certain period of time.

    And I’d be remiss if I didn’t note that Dave promoted the use of pureQuery, which can be used to turn dynamic JDBC into static requests. Using pureQuery can improve performance (perhaps as much as 25 percent), as well as simplifying debugging & maintenance.

    Dave also discussed how Hibernate can cause performance problems. Which brings me to the first session I attended on day three, John Mallonee’s session titled Wake Up to Hibernate. Hibernate is a persistent layer that maps Java objects to relational tables. It provides an abstraction layer between DB2 and your program. And it can also be thought of as a code generator. Hibernate plugs into popular IDEs, such as Eclipse and Rational tools. It is open source, and part of JBoss Enterprise Middleware (JBoss is a division of Red Hat).

    John walked attendees through Hibernate, discussing the Java API for persistence, its query capabilities (including HQL, or Hibernate Query Language), and configuration issues. Examples of things that are configurable include JDBC driver, connection URL, user name, DataSource, connection pool settings, SQL controls (logging, log formatting), and the mapping file location.

    HQL abstracts SQL. It is supposed to simplify query coding, but from what I saw of it in the session, I am dubious. John warned, too, that when HQL is turned into SQL the SQL won’t necessarily look the way you are used to seeing it. He recommended to setup the configuration file such that it formats the generated SQL or it won’t be very readable. John noted that one good thing about HQL is that you cannot easily write code with literals in them; it forces you to use parameter markers.

    OK, so why can Hibernate be problematic? John talked about four primary concerns:

    1. SQL is obscured
    2. performance can be bad with generated code
    3. Hibernate does not immediately support new DB2 features
    4. Learning curve can be high

    But he also noted that as you learn more about these problems -- and how Hibernate works -- that things tend to improve. Finally (at least with regard to Hibernate) John recommends that you should consider using HQL for simple queries, native SQL for advanced queries, for special situations use JDBC, and to achieve the highest performance use native DB2 SQL (e.g. stored procedure).

    I also attended two presentations on the DB2 for z/OS optimizer. Terry Purcell gave his usual standout performance on optimization techniques. I particularly enjoyed his advice on what to say when someone asks why the optimizer chose a particular path: “Because it thinks that is the lowest cost access path.” After all, the DB2 optimizer is a cost-based optimizer. So if it didn’t choose the “best” path then chances are you need to provide the optimizer with better statistics.

    And Suresh Sane did a nice job in his presentation in discussing the optimization process and walking thru several case studies.

    All-in-all, it has been a very productive IDUG conference… but then again, I didn’t expect it to be anything else! Tomorrow morning I deliver my presentation titled “The Return of the DB2 Top Ten Lists.” Many of you have seen my original DB2 top ten lists presentation, but this one is a brand new selection of top ten lists… and I’m looking forward to delivering it for the first time at IDUG…

    Wednesday, May 12, 2010

    IDUG Tampa 2010, Day One

    As usual, the North American IDUG conference is proving to be a hectic, yet enjoyable and informative time. The days are packed from morning til evening with technical sessions, networking, and running from here to there and back again.

    Tuesday was the first day for normal IDUG sessions (the day-long seminars were moved to Monday this year), and the day was dominated (for me at least) by DB2 10 sessions. The spotlight session by Jeff Josten was an information-packed 90 minutes overview of DB2 10 that can only be described as drinking from a firehose. Myself and about 200 other curious attendees sat in attention as Jeff discussed the features that back up the themes of Versionn 10, which are efficiency, resilience, and growing new workloads on DB2 for z/OS.

    Jeff didn’t share a GA date for the new version, nor would anyone else from IBM this week, but it has been strongly hinted that it could be before the end of the year (2010).

    The biggest “thing” being touted by IBM about DB2 10 is the performance gains it delivers right out-of-the-box. Jeff discussed IBM’s performance objectives as historically being to deliver less than a 5% performance regression from release to release. But things have perked up recently. For DB2 9, most customers reported no regression or gain out of box. And the new goal is no longer containing regression, but delivering gain. For DB2 10, the expectation is that many customers will reduce CPU time 10% to 20% right out-of-the-box.

    In IBM’s labs, Jeff indicated that the out-of-the-box CPU reduction numbers for traditional workloads are ranging from 5-10% and for newer workloads (e.g. TCP/IP, stored procedures) the improvement is as much as 20% in lab measurements. And when you start using new functionality, you can reasonably expect to see up to 10% CPU reduction. Of course, Jeff was careful to note that these are pre-GA numbers so things could change, even though there is no expectation that they will change.

    Additionally, there is a lot of focus on scalability in DB2 10. Shops can expect to support 5x to 10x more concurrent users, up to 20,000 per subsystem. This is possible due to virtual storage relief: threads have been moved above the bar.

    Jeff went on to cover a lot of additional new functionality to be delivered with DB2 10 including parellel index update during INSERT (which should speed up inserts against tables with multiple indexes), DB2’s usage of 1MB page size (z/OS) in buffer pools, multiple SQL access path and performance improvements, efficient caching of dynamic SQL with literals, LOB streaming between DDF and rest of DB2, Workfile spanned records (PBG), INSERT improvements for UTS, solid state disk monitoring and exploitation, temporal data support, timestamp data type improvements, and more.

    Hash support is particularly interesting. With hashing you can get direct access to data with a single getpage instead of the multi-getpage approach of b-tree indexing. The targeted use case for hashes is for lookup of a row based upon primary key. The hashing algorithm is stored in the DB2 engine. Never fear, though, because you can still define additional indexes on hashed tables and the optimizer will understand and prefer hashed access when it is possible. (I hear the IMS DBAs out there laughing. DB2 DBAs are now going to need to understand space calculations for hash space and what collisions and overflow means.)

    Next up was Roger Miller who covered DB2 10 from a database administration perspective. He began his session by referencing the extra detail that is available in the DB2 10 webcast presentation that Roger did about last month, which is available on the web.

    Roger states that a lot of what is at the heart of DB2 10 is about making things easier for DBAs. And then to prove his point he talked for an hour about all of those things. Highlights included the reduced need for REORG, monitoring enhancements, hashing, and pureXML enhancements for usability, scalability, and performance.

    A particularly interesting point made by Roger is that query parallelism these days is less about decreasing elapsed time and more about the ability to shuttle workload to a zIIP.

    Roger also discussed the ability to skip V9 and go directly from V8 to V10. He also expressed concern that folks who choose to do this not ignore learning all about V9 when they do this. For example, RUNSTATS in V9 had key changes, so shops need to be careful to run RUNSTATS when moving to V10.

    Roger also spoke about the significant changes to the DB2 Catalog and DB2 Directory in DB2 10. There are about 60 new table spaces, the links have been removed, inline LOBs are used in many places, and row level locking is used. These changes mean that online REORG works for everything in the catalog & the directory.

    He also spoke about the various improvements to security administration in DB2 10. There is a new SECADM authority with no access to data and there is also a new option for DBADM without data access. Another nice new option is DBADM authority for every database in the subsystem. And then there is the ability to REVOKE without cascading, something that DB2 security administrators have been looking for for years!

    Changing pace, I attended Billy Sundarrajan’s presentation on “De-mystifying JDBC Universal Drivers – for the z/OS DBA.” The reality is that more and more dynamic SQL applications are being implemented, so knowing about JDBC drivers is a necessity, not a luxury for the mainframe DBA.

    Billy discussed the types of JDBC drivers and the installation issues involved. You can connect using a type 2 or type 4 driver. The Type 2 driver connects directly without DB2 Connect gateway; Type 4 driver connects thru DB2 Connect gateway.

    He also discussed the benefits of setting end user variables for monitoring and the different properties that can be used for configuration.

    Of course, I attended a few other sessions and spent some time at the exhibit hall and caught up with some old friends and… well, this is long enough of a post for the first day… check back tomorrow for a shorter (I promise) synopsis of day two.

    Sunday, May 09, 2010

    IDUG in Tampa

    It is Sunday, May 9, 2010 and I'm posting a brief blog entry today to remind everyone about IDUG in Tampa this week. I will be attending (arrive Monday morning) and I will update my blog with the highlights of what is happening in Tampa this week... so be sure to check in regularly.

    Friday, April 30, 2010

    On Becoming a DBA

    Perhaps the most frequent question I am asked is: How can I become a DBA?

    The answer, of course, depends a lot on what you are currently doing. Programmers who have developed applications using a database system are usually best-suited to becoming a DBA. They already know some of the trials and tribulations that can occur when accessing a database.

    If you are a programmer and you want to become a DBA, you should ask yourself some hard questions before you pursue that path. First of all, are you willing to work additional, sometimes crazy, hours? Yes, I know that many programmers work more than 40 hours already, but the requirements of the DBA job can push people to their limits. It is not uncommon for DBAs to work late into the evening and on weekends; and you better be ready to handle technical calls at 2:00 a.m. when database applications fail.

    Additionally, you need to ask yourself if you are insatiably curious. A good DBA must become a jack-of-all-trades. DBAs are expected to know everything about everything -- at least in terms of how it works with databases. From technical and business jargon to the latest management and technology fads, the DBA is expected to be "in the know." And do not expect any private time: A DBA must be prepared for interruptions at any time to answer any type of question -- and not just about databases, either.

    And how are your people skills? The DBA, often respected as a database guru, is just as frequently criticized as a curmudgeon with vast technical knowledge but limited people skills. Just about every database programmer has his or her favorite DBA story. You know, those anecdotes that begin with "I had a problem..." and end with "and then he told me to stop bothering him and read the manual." DBAs simply do not have a "warm and fuzzy" image. However, this perception probably has more to do with the nature and scope of the job than with anything else. The DBMS spans the enterprise, effectively placing the DBA on call for the applications of the entire organization. As such, you will interact with many different people and take on many different roles. To be successful, you will need an easy-going and somewhat amiable manner.

    Finally, you should ask yourself how adaptable you are. A day in the life of a DBA is usually quite hectic. The DBA maintains production and test environments, monitors active application development projects, attends strategy and design meetings, selects and evaluates new products and connects legacy systems to the Web. And, of course: Joe in Accounting just resubmitted that query from hell that's bringing the system to a halt. Can you do something about that? All of this can occur within a single workday. You must be able to embrace the chaos to succeed as a DBA.

    Of course, you need to be organized and capable of succinct planning, too. Being able to plan for changes and implement new functionality is a key component of database administration. And although this may seem to clash with the need to be flexible and adaptable, it doesn't really. Not once you get used to it.

    So, if you want to become a DBA you should already have some experience with the DBMS, be willing to work long and crazy hours, have excellent communication and people skills, be adaptable and excel at organization. If that sounds like fun, you'll probably make a good DBA.

    Thursday, April 22, 2010

    The Ever-Changing Role of the DBA

    Defining the job of DBA is getting to be increasingly difficult. Oh, most people know the rudimentary aspects of the job, namely keeping your organization's databases and applications running up to par. The DBA has to be the resident DBMS expert (whether that is DB2, Oracle or SQL Server, or most likely a combination of those). He or she has to be able to solve thorny performance problems, ensure backups are taken, recover and restore data when problems occur, make operational changes to database structures and, really, be able to tackle any issue that arises that is data-related.

    The technical duties of the DBA are numerous. These duties span the realm of IT disciplines from database design to physical implementation and consistent, on-going monitoring of the database environment.

    DBAs must possess the abilities to create, interpret, and communicate a logical data model and to create an efficient physical database design from a logical data model and application specifications. There are many subtle nuances involved that make these tasks more difficult than they sound. And this is only the very beginning. DBAs also need to be able to collect, store, manage, and query data about the data (metadata) in the database and disseminate it to developers that need the information to create effective application systems. This may involve repository management and administration duties, too.

    After a physical database has been created from the data model, the DBA must be able to manage that database once it has been implemented. One major aspect of this management involves performance management. A proactive database monitoring approach is essential to ensure efficient database access. The DBA must be able to utilize the monitoring environment, interpret its statistics, and make changes to data structures, SQL, application logic, and the DBMS subsystem to optimize performance. And systems are not static, they can change quite dramatically over time. So the DBA must be able to predict growth based on application and data usage patterns and implement the necessary database changes to accommodate the growth.

    And performance management is not just managing the DBMS and the system. The DBA must understand SQL used to access relational databases. Furthermore, the DBA must be able to review SQL and host language programs and to recommend changes for optimization. As databases are implemented with triggers, stored procedures, and user-defined functions, the DBA must be able to design, debug, implement, and maintain these code-based database objects as well.

    Additionally, data in the database must be protected from hardware, software, system, and human failures. The ability to implement an appropriate database backup and recovery strategy based on data volatility and application availability requirements is required of DBAs. Backup and recovery is only a portion of the data protection story, though. DBAs must be able to design a database so that only accurate and appropriate data is entered and maintained - this involves creating and managing database constraints in the form of check constraints, rules, triggers, unique constraints, and referential integrity.

    DBAs also are required to implement rigorous security schemes for production and test databases to ensure that only authorized users have access to data. As industry and governmental regulations multiply, the need to audit who did what to which data when is also a requirement for sensitive data in production systems – and the DBA must be involved in ensuring data auditability without impacting availability or performance.

    And there is more! The DBA must possess knowledge of the rules of relational database management and the implementation of many different DBMS products. Also important is the ability to accurately communicate them to others. This is not a trivial task since each DBMS is different than the other and many organizations have multiple DBMS products (e.g., DB2, Oracle, SQL Server).

    Remember, too, that the database does not exist in a vacuum. It must interact with other components of the IT infrastructure. As such, the DBA must be able to integrate database administration requirements and tasks with general systems management requirements and tasks such as network management, production control and scheduling, and problem resolution, to name just a few systems management disciplines. The capabilities of the DBA must extend to the applications that use databases, too. This is particularly important for complex ERP systems that interface differently with the DBMS. The DBA must be able to understand the requirements of the application users and to administer their databases to avoid interruption of business. This includes understanding how any ERP packages impact the business and how the databases used by those packages differ from traditional relational databases.

    But Things Are Changing

    So at a high level, DBAs are tasked with managing and assuring the integrity and efficiency of database systems. But keep in mind, too, that there are actually many different DBAs. Some focus on logical design; others focus on physical design; some DBAs specialize in building systems and others specialize in maintaining and tuning systems; and there are specialty DBAs and general-purpose DBAs. Truly, the job of DBA encompasses many roles.

    Some organizations choose to split DBA responsibilities into separate jobs. Of course, this occurs most frequently in larger organizations, because smaller organizations often cannot afford the luxury of having multiple, specialty DBAs.

    Still other companies simply hire DBAs to perform all of the tasks required to design, create, document, tune, and maintain the organization’s data, databases, and database management systems.

    But no matter what "type" of DBA you happen to be, chances are that your role is changing and adapting to new types of computing and data requirements. Indeed, one of the biggest challenges for DBAs these days is the ongoing redefinition of the job roles and responsibilities.

    The primary role of database "custodian," of course, continues to be the main emphasis of the job. But that is no longer sufficient for most organizations. The DBA is expected to take on numerous additional -- mostly technical -- roles. These can include writing application code, managing the application server, enterprise application integration, managing Web services, network administration and more.

    If you compare the job description of DBAs across several organizations, it is likely that no two of them would match exactly. This is both good and bad. It is good because it continually challenges the technically-minded employees who tend to become DBAs. But it can be bad, too; because the job differs so much from company to company, it becomes more difficult to replace a DBA who leaves or retires. And no one can deny that database administration is a full-time, stressful job all on its own. But the stress level just keeps increasing as additional duties get tacked onto the DBA's "to do" list.

    Summary

    There are many jobs that DBAs perform and it can be confusing when you try to match job title up against the responsibilities of the job. Don't let your job title keep you from expanding into other, related disciplines. The more you know and the more you can do, the more employable you become... and that is important in this day and age!

    Thursday, April 01, 2010

    Nominate Someone for the CA IDUG Award for Outstanding Work in DB2

    As many of you know, each year CA sponsors an award at IDUG to honor outstanding work with DB2. The only requirement for the award is that you and your company have used DB2, either on the mainframe or distributed system, in a novel, ground-breaking, or cutting-edge manner.

    There is no requirement that an entrant organization be based in North America. There is also no requirement that an entrant organization license or use specific CA or other vendor database management tools or that he/she attend the IDUG conference in person. All nominations will be judged by a panel of independent DB2 consultants in conjunction with CA executives.

    Nominations are now open through April 26, 2010. To learn more, visit ca.com/awards/db2.

    The winner will be announced at the IDUG North America conference, May 10-14, 2010.

    Each winner will receive:

    • One complimentary full IDUG conference pass for any IDUG Europe or IDUG North America conference held prior to December 2011
    • Half-day consulting engagements by each of the consultant judges within 12 months of award presentation
    • Expense reimbursement up to $1,500 either (a) the winner’s travel to an IDUG conference held prior to December 2011 or (b) travel by one of the DB2 consultants for on-site provision of the complimentary consulting services described above
    • Recognition plaque
    Submit your project today and get recognized for your outstanding work. Visit ca.com/awards/db2.

    Wednesday, March 17, 2010

    What is Production Data?

    I received an interesting e-mail recently that made me stop and think a bit... so I thought I'd blog about it. Basically, the e-mail posed the question in the title of this blog entry – “What is production data?”

    The e-mail read as follows:


    I'm looking for a one paragraph definition of "production data". What do you think of this: "Production data is data recorded for the purpose of controlling/managing/reporting/researching events, processes or states."

    I'm trying to get around the belief that data recorded by a development team to manage its projects and resources is somehow less than production data. To me it should be regarded as the development team's "production data" and so I'm looking for a definition that satisfactorily encompasses that belief, as well as encompassing regular business production data.


    You know, I do not recall ever seeing an actual definition of the term “production data.” The above definition is a good starting point, but I do not think it is complete. The author of the e-mail makes a good point about different types of production data. The data used by an application development team to conduct their business (writing computer programs to support business processes) is definitely production data… to the application development team.

    Here is my take on a definition:

    • Production data is information that is persistently stored and used by professionals to conduct business processes. It must be accurate, documented, and managed on an on-going basis to ensure its value to the organization.

    I say information instead of data because the data must be defined and in context in order to be useful for production work. And I say persistent because even though there may be many forms of transitory data used by production processes, it is the data that is stored over periods of time that needs to be managed.

    I think this definition should serve the needs of the e-mailer... and more. What do you think?

    Did I miss anything?

    Friday, March 05, 2010

    Mainframes: The Safe IT Career Choice

    A recent Computerworld article (Bank of America touts mainframe work as a safe career) touts the mainframe as a safe haven for those considering a career in IT. This is an interesting article because the usual spiel you hear in industry trade rags is that the mainframe is dying and only a fool would work on such a platform. It is good to hear an alternate opinion on the matter in a journal as respected as Computerworld. (Of course, the fact that I agree with this opinion might have a little something to do with my cheer upon reading the article.)

    One of the highlights of this particular article is the discussion of avialable mainframe jobs at sites such as Monster (764 jobs over 30 days) and Dice.com (1,200 ads over 30 days). These are significant numbers of jobs, especially in a down economy.

    Another interesting tidbit from this piece is that "IBM says it's mainframe revenue has grown in eight of the last 13 quarters." This is impressive; consider the difficult servers market coupled with the impression that the platform is dying.

    Speaking of the death of the mainframe, don't you believe it for a minute. People having been predicting the death of the mainframe since the advent of client/server in the late 1980s. That is more than 20 years! Think of all the things that have died in that timespan while the mainframe keeps on chugging away: IBM's PC business, Circuit City, Koogle peanut butter, public pay phones, Johnny Cash... the list is endless.

    Some may counter that they recall reading about companies that were going to eliminate their mainframe. Well, yes, I'm sure you do remember those, I do, too. But do you recall reading many articles about companies that SUCCESSFULLY eliminated their mainframes? Many tried, few succeeded. Indeed, the re-Boot Hill web site provides examples of companies that tried to eliminate the mainframe but could not (hence, they had to re-boot). If you follow the link to the re-Boot Hill site click on the little tombstones to read the stories of failure.

    So, the mainframe is a rock-solid platform, continues to grow, and is producing a significant number of job opportunities... what is not to like?

    This blog has moved

    This blog is now located at http://db2portal.blogspot.com/.

    You will be automatically redirected in 30 seconds, or you may click here.

    For feed subscribers, please update your feed subscriptions to
    http://db2portal.blogspot.com/atom.xml.

    Tuesday, February 09, 2010

    IBM Announces DB2 10 for z/OS Beta Program

    IBM announced the beta program for the next version of DB2 today, now "officially" known as DB2 10 (no more DB2 X). It is a closed beta program that will begin on March 12, 2010. That means you have to be selected by IBM to participate.

    The announcement highlighted some of the areas of improvement to be delivered by DB2 10 for z/OS, and at the top of that list, to no one's surprise, is performance. DB2 10 promises to deliver out-of-the-box savings by improving operational efficiencies ranging from 5% to 10% out-of-the-box CPU savings for traditional workloads and up to 20% out-of-the-box CPU savings for nontraditional workloads.

    Other areas called out by IBM in the announcement include
    • Improved business resiliency through scalability improvements and fewer outages (planned or unplanned).
    • Schema evolution or data definition on demand as well as query performance manageability enhancements support improved availability.
    • New features such as hash access, index include columns, inline large objects, parallel index updates, faster single row retrievals, work file in-memory, index list prefetch, 64-bit memory enhancements, use of the 1 MB page size of the System z10, buffer pools in memory, access path enhancements, member clustering for universal table spaces, efficient caching of dynamic SQL statements with literals, improved large object streaming, and SQL procedure language performance.
    • Rapid application and warehouse deployment for business growth including improved concurrency for data access, data management, and data definition.
    • The ability avoid an outage by adding active log data to a subsystem.
    • Improved application and data warehousing support including temporal data, a 64 bit ODBC driver, currerntly committed locking, implicit casting or loose typing, timestamp with time zone, variable timestamp precision, moving sum, and moving average.
    • Improvements to DB2's XML support including expanded pureXML, customer-driven performance and usability requirements, schema validation in the engine, binary XML exchange format, multiversioning, easy update of subparts of XML document, stored procedures, user-defined functions and triggers, XML index matching with date/timestamp, and a CHECK XML utility.
    • Enhanced query and reporting facilities, including QMF V10 with over 140 new analytical functions, support for HTML, PDF, and Flash reports, and more.
    So it would seem that there is a lot of new functionality for us to begin to become acquainted with. As IBM rolls out more details, and customers begin to use the new version of DB2, we will examine some of these new features in more depth here on the DB2 Portal blog.

    If you are interested in the beta program, the pre-requisite for DB2 10 is z/OS V1.10 (5694-A01) or later running in 64 bit mode. More information about the DB2 10 beta program is available on IBM's web site.

    No GA date for DB2 10 has been announced.

    Wednesday, February 03, 2010

    IBM Manages the Data Lifecycle

    Data lifecycle is a somewhat new-ish term, at least in terms of what I plan to talk about in this blog posting. The data lifecycle – and data lifecycle management – deals with tracking, managing, and understanding data and metadata as it flows through organizations. From its inception…whether entered by a clerk or read via a feed or loaded from an external source, etc…through its various usages…whether to conduct business, analyze trends and patterns, and so on…tracked from system to system, application to application, and user to user…and finally through its end of life.

    Not many companies today can track all of their important data and what happens to it throughout its entire lifecycle. But doing so is important. Having such a capability enables organizations to adapt and react, gaining a competitive advantage. Much can go awry as data moves throughout an organization. Schema changes, policy changes, regulations adapt, programs change, formats changes, and so on. Any of these things can cause data quality issues, which should be brought to the attention of the business analyst using the data. But how often is this done? Knowing the history of data and its related metadata can improve business processes. But it is a major task – both for businesses and IT vendors hoping to offer solutions.

    Which brings me to today’s (February 3, 2010) announcements from IBM. Big Blue announced new data protection software, a line of consulting services and resources and previewed information monitoring software to help organizations expand their use of trusted information to improve decision making. These moves further bolster IBM’s already formidable arsenal of data lifecycle management solutions.

    The data protection announcement was for Optim Data Redaction. This solution, engineered for unstructured data like Word documents and PDF files, automatically recognizes and removes sensitive content from documents and forms. For example, a customer’s credit scores in a loan document could be hidden from an office clerk, while still being visible to a loan officer. In today’s atmosphere of more and more stringent regulations, a data redaction solution is becoming a requirement. For example, PCI DSS industry standards dictate specific rules regarding the display of debit and credit card information on receipts and reports.

    Optim Data Redaction is planned for general availability in March 2010.

    The information monitoring announcement was for InfoSphere Business Monitor. This technology is based on a combination of work from IBM’s research group and technology gained when IBM acquired Guardium. Guardium is a database activity monitoring (or auditing) solution. InfoSphere Business Monitor tracks the quality and flow of an organization’s information and provides real-time alerts of potential flaws. For example, if a health insurance company was analyzing profit margins across different product lines (individual, group, HMO, Medicare, etc.), decision makers would immediately be alerted when a data feed from a specific geography was not successfully integrated.

    InfoSphere Business Monitor is available as a technology preview; it is not generally available and no GA date was announced.

    At the same time, IBM announced its intention to acquire Initiate Systems, a provider of data integrity software for information sharing among healthcare and government organizations. Initiate's software helps healthcare clients work more intelligently and efficiently with timely access to patient and clinical data. It also enables governments to share information across multiple agencies to better serve citizens. IBM plans to continue to support and enhance Initiate's technologies while helping clients take advantage of the broader IBM portfolio, specifically Cognos and InfoSphere solutions for BI and analytics. This acquisition bolsters IBM’s data lifecycle management offering along these verticals.

    And all of today’s announcements serve to clarify IBM’s ascent to the throne within the realm of information and data lifecycle management.

    Tuesday, February 02, 2010

    Larry Sure Knows How to Get Press

    Going under the assumption (I assume) that no press is bad press, Oracle CEO Larry Ellison has attacked IBM's DB2... but he made several factual errors in his rant.

    Here are some of the highlight (?) of the claims Ellison made about DB2 during a webcast last week.

    Regarding TPC-C benchmarks, Ellison claims to have "(blown) the doors off of IBM. We crushed them." He went on to elaborate saying "In a machine that took up less than 10% the floor space, of IBM's record setting computer. We ran faster, we ran a lot faster: using a tiny fraction of the floor space, a tiny fraction of the power, cost less."

    First of all, technicians working in trenches know that benchmarks are not indicative of real life performance. That aside, it is true that Oracle currently has the leading TPC-C benchmark result. Until late in 2009, DB2 enjoyed a massive 49% lead over Oracle. Oracle's most recent results give them a 25% lead (using more than six times as many CPU cores to do it).

    Regarding the claim of using less space and power, this is due to Oracle using flash memory and comparing it with an IBM benchmark using conventional disk technology. If Oracle compared its benchmark to an IBM system using flash memory, these claims would not stand.

    Later, Ellison claimed that "SAP chooses the Oracle Database to run under SAP in almost all their large accounts." As anyone who follows the computer industry knows, this claim is rather absurd. SAP's customers choose the DBMS to run, not SAP. And if SAP had anything to say about it, they would not recommend Oracle, their biggest competitor in the commercial business applications space. Furthermore, SAP favors DB2 for their own systems. They operate more than a thousand SAP systems, and all of those systems run on DB2.

    Perhaps the silliest of Ellison's comments is this: "The Oracle Database scales out, IBM DB2 for Unix does not. Let me see, how many servers can IBM put together for an OLTP application? Let's see, how many can they group together? Um, one. They can have up to one server attacking really big jobs. When they need more capacity, they make that server bigger. And then they take the old server out, put a bigger one in. And when you've got the biggest server, that's it. That's all the can do for OLTP." Ellison also claimed that IBM "can't scale out, they can't do cloud, they can't do clusters, the can't do any of this."

    I bet this surprised a lot of DB2 users doing these things with DB2! DB2 Parallel Edition was released in 1995, along with the capability to scale to a system of over a 100 Unix servers. DB2 LUW scalability is proven in many of the world's largest OLTP environments. Consider this press release talking about how DB2 LUW powers one of the largest OLTP systems in the world.

    And what about that clustering claim? Evidently Mr. Ellison slept through 2009. IBM DB2 pureScale, released last year, offers powerful, efficient database clustering. For a cluster of 64 nodes, DB2 pureScale maintains 95% efficiency. At 128 nodes, DB2 pureScale maintains 84% efficiency. This is important because if you are growing a cluster to handle bigger workloads, you want your hardware to be doing productive work, not handling system overhead. On the other hand, Oracle RAC has a 100 server limit...

    Ellison also made other far-out claims about IBM like "They're so far behind, I don't think they have any chance at all. I'm serious." Ellison also said "They are not competitive in the database business, except on the mainframe."

    If this were true, why would Ellison spend any time thinking or talking about IBM. He must be worried, IMHO. Anyone with even a cursory knowledge of the computer industry has to admire IBM. They have led the industry in developing patents for the last 17 years. In 2009, IBM produced 4914 patents while Oracle did not even place in the top 50 patent leaders. A search of the US Patent office database reveals 1588 patents with "database" in the patent description while Oracle produced only 184 patents.

    Hyperbole is one thing, but gross inaccuracy is another. In his latest tirade, Ellison is guilty of both. Oracle makes a good DBMS... pity its CEO doesn't think it can sell it on its own merits without making up stuff about the competition.

    Monday, February 01, 2010

    Some New Year's Resolutions for DBAs

    This is sort of a re-blogging (to coin a term). I first published this last month in the Data Management Today blog I wrote for NEON. Well, I no longer work for NEON and I'm not sure how long that blog will remain active, so I thought it might make sense to re-blog some of the pertinent entries here... so here goes with my New Year's Resolutions for DBAs blog entry...

    At the beginning over every year many of us take the time to cobble together some resolutions for the coming year. We plan to lose weight, save money, stop smoking, and so on. Usually, it doesn’t take long before we’ve abandoned these resolutions. Perhaps we’d be wiser to make some business related resolutions. With that in mind, here are some thoughts on the New Year’s resolutions you might be wise to make as a DBA in 2010.

    Are you insatiably curious? A good DBA must become a jack-of-all-trades. DBAs are expected to know everything about everything -- at least in terms of how it works with databases. From technical and business jargon to the latest management and technology fads, the DBA is expected to be "in the know." So perhaps “be more curious” would be a useful DBA resolution.

    Most DBAs know that private time is a luxury we cannot afford. A DBA must be prepared for interruptions at any time to answer any type of question -- and not just about databases, either. With that in mind, how are your people skills? DBA are usually respected as a database guru, but also frequently criticized as a curmudgeon with limited people skills. Just about every database programmer has his or her favorite DBA story. You know, those anecdotes that begin with "I had a problem..." and end with "and then he told me to stop bothering him and read the manual." DBAs simply do not have a "warm and fuzzy" image. However, this perception probably has more to do with the nature and scope of the job than with anything else. The DBMS spans the enterprise, effectively placing the DBA on call for the applications of the entire organization. As such, you will interact with many different people and take on many different roles. To be successful, you will need an easy-going and somewhat amiable manner. So another good New Year’s resolution might be to “improve your people skills.” Take a Dale Carnegie course or start by reading Carnegie’s seminal book, How to Win Friends and Influence People.

    How adaptable you are? A day in the life of a DBA is usually quite hectic. The DBA maintains production and test environments, monitors active application development projects, attends strategy and design meetings, selects and evaluates new products and connects legacy systems to the Web. And, of course: Joe in Accounting just resubmitted that query from hell that's bringing the system to a halt. Can you do something about that? All of this can occur within a single workday. You must be able to embrace the chaos to succeed as a DBA. So a third resolution might be to “roll with the punches” better – and without complaining!

    Of course, you need to be organized and capable of succinct planning, too. Being able to plan for changes and implement new functionality is a key component of database administration. And although this may seem to clash with the need to be flexible and adaptable, it doesn't really. Not once you get used to it. You just need to prepare yourself to be adapatable and organize to incorporate change more rapidly than others. So my final suggestion for a 2010 New Year’s resolution is to adopt a planning methodology and stick to it. Buy a planner – either electronic or not – and use it this year. You might even consider taking a time management class.

    If you keep all of these resolutions, just imagine how productive you will be in 2010. And then you can use 2011 to lose weight and save money and…

    Monday, January 25, 2010

    Which is better? "BETWEEN" vs "<=" and >"="

    This was a recent topic on the DB2-L mailing list so I thought I'd weigh in with my two cents worth on the topic.

    As with most DB2 (and, indeed, IT) issues, the correct answer is "it depends!" Let's dig a bit deeper to explain what I mean.

    From a maintainability perspective, BETWEEN is probably better. The BETWEEN predicate is easier to understand and code than the equivalent combination of the less than or equal to predicate (<=) and the greater than or equal to predicate (>=). In past releases, in many cases it was more efficient, too. But today the Optimizer recognizes the two formulations as equivalent and there usually is no performance benefit one way or the other. Performance reasons aside, one BETWEEN predicate is easier to understand and maintain than multiple <= and >= predicates. For this reason, I tend to favor using BETWEEN.

    But not always. Consider the scenario of comparing a host variable to two columns. Usually BETWEEN is used to compare one column to two values, here shown using host variables:

    WHERE COLUMN1 BETWEEN :HOST-VAR1 AND :HOST-VAR2

    However, it is possible to use BETWEEN to compare one value to two columns, as shown:

    WHERE :HOST-VAR BETWEEN COLUMN1 AND COLUMN2

    This statement should be changed to

    WHERE :HOST_VAR >= COLUMN1 and :HOST-VAR <= COLUMN2

    The reason for this exception is that a BETWEEN formulation comparing a host variable to two columns is a Stage 2 predicate, whereas the preferred formulation is Stage 1. And we all know that Stage 1 outperforms Stage 2, right?

    Remember too, that SQL is flexible and often the same results can be achieved using different SQL formulations. Sometimes one SQL statement will dramatically outperform a functionally equivalent SQL statement just because it is indexable and the other is not. For example, consider this SQL statement

    SELECT EMPNO, FIRSTNME, MIDINIT, LASTNAME
    FROM EMP
    WHERE MIDINIT NOT BETWEEN 'A' AND 'G';

    It is not indexable because it uses the NOT BETWEEN predicate. However, if we understand the data in the table and the desired results, perhaps we can reformulate the SQL to use indexable predicates, such as

    SELECT EMPNO, FIRSTNME, MIDINIT, LASTNAME
    FROM EMP
    WHERE MIDINIT >= 'H';

    Or we could code MIDINIT BETWEEN 'H' AND 'Z' in place of MIDINIT >= 'H'. Of course, for either of these solutions to work correctly we would need to know that MIDINIT never contained values that collate lower than the value 'A'.

    So, as usual, there is no one size fits all answer to the question!