Tuesday, March 24, 2015

Time to Start Your IDUG in Philadelphia Planning

Spring is in the air... well, at least it is South of the Mason-Dixon line... and that means it is time to plan your trip to this year's IDUG North American DB2 Tech Conference.

Anybody who has ever attended an IDUG conference knows about all of the good things you can expect to encounter, experience and learn at the event. That includes technical session on all of the latest and greatest DB2 technologies and features, networking opportunities to meet IBM developers and industry consultants, and the vendor exhibit hall where you can learn about software and tools to help you get the most out of DB2... not to mention picking up a few tchotchkes and mementos.

Additionally, there are Special Interest Groups (where you can discuss the latest industry trends and topics with other interested technicians), Hands-On Labs (delivering working training led by IBM), and the opportunity to take free IBM certification exams.

But there's even more... not only is this year's conference in Philadelphia, PA, a new venue for IDUG, but the half- and full-day educational seminars (that used to require an additional fee) are included in your registration fee.

I'll be presenting my DB2 Performance Roadmap presentation  at this year's IDUG, so be sure to stop in to say "Hi" and chat about DB2, big data, or your favorite topic du jour!

The IDUG DB2 Tech Conference is the place to be to learn all about DB2 from IBMers, gold consultants, IBM champions, end users, and more. With all of this great stuff going on in Philadelphia this May, why wouldn't you want to be there!?!?


Download the IDUG brochure for more details.

Monday, March 16, 2015

SQL Performance Basics: Part 2, Rely on Indexes

Perhaps the single most important aspect of SQL tuning is indexing. All developers should know all of the indexes that exist on any table upon which they write SQL statements. When an index exists on a column (or columns), DB2 can use the index to reduce I/O instead of scanning the entire table to satisfy a predicate. 

For critical queries, where no useful indexes exists, it usually makes sense to create an index to improve query performance. Of course, developers should enlist the assistance of a DBA to ensure the creation of appropriate indexes.

Let's learn with Bsome basics. For example, consider this SQL statement:
    SELECT   LASTNAME, SALARY 
    FROM     EMP 
    WHERE    EMPNO = '000010' 
    AND      DEPTNO =  'D01'
What index or indexes would make sense for this simple query? ""'First, think about all the possible indexes that you could create. Your first short list probably looks something like this:
  • Index1 on EMPNO
  • Index2 on DEPTNO
  • Index3 on EMPNO and DEPTNO
This is a good start, and Index3 is probably the best of the lot. It lets DB2 use the index to immediately look up the row or rows that satisfy the two simple predicates in the WHERE clause. Of course, if you already have a lot of indexes on the EMP table, you might want to examine the impact of creating yet another index on the table. There are several factors to consider. 

Indexing Factors to Consider
For starters, you need to weigh the impact of data modification. DB2 will automatically maintain every index you create. This means every INSERT and every DELETE to this table will insert and delete not just from the table, but also from its indexes. And if you UPDATE the value of a column that is in an index, you also update the index. So, indexes speed the process of retrieval but slow down modification.
You should also consider the impact to any existing indexes and applications before creating a new index. If an index already exists on EMPNO or DEPTNO, it might not be wise to create another index on the combination. However, it might make sense to change the other index to add the missing column. But not always, because the order of the columns in the index can make a big difference depending on the query. For example, consider this query:
SELECT   LASTNAME, SALARY 
FROM     EMP 
WHERE    EMPNO = '000010' 
AND      DEPTNO >  'D01';
In this case, EMPNO should be listed first in the index. And DEPTNO should be listed second, allowing DB2 to do a direct index lookup on the first column (EMPNO) and then a scan on the second (DEPTNO) for the greater-than.
Furthermore, if indexes already exist for both columns (one for EMPNO and one for DEPTNO), DB2 can use them both to satisfy this query so creating another index might not be necessary.
Finally, you should consider the importance of the query you are attempting to tune. The more important the query, the more you might want to tune by index creation. Of course, the term "importance" is not always easy to quantify. If you are coding a query that the CEO will run every day, you want to make sure it delivers optimal performance. So building indexes for that particular query is important. On the other hand, a query for a clerk might not necessarily be weighted as high, so that query might have to make do with the indexes that already exist. Of course, the decision depends on the application's importance to the business--not just on the user's importance. If the clerk runs business critical transactions and the CEO is simply printing off a report for later consumption, then the importance varies... right?

Index Overloading
Index design involves much more than I have covered so far. For example, you might consider index overloading to achieve index-only access. If all the data that a SQL query asks for is contained in the index, DB2 might be able to satisfy the request using only the index. Consider our previous SQL statement. We asked for LASTNAME and SALARY, given information about EMPNO and DEPTNO. And we also started by creating an index on the EMPNO and DEPTNO columns. If we include LASTNAME and SALARY in the index as well, we never need to access the EMP table because all the data we need exists in the index. This technique can significantly improve performance because it cuts down on the number of I/O requests.
Keep in mind that making every query an index-only access is not prudent or even possible. You should save this technique for particularly troublesome or important SQL statements.

The Bottom Line
A solid understanding of the indexes that exist -- and how additional indexes can help -- is vital to the performance of your DB2 applications. Take the time to understand the indexes that exist for your applications and you can become a better developer, becoming more valuable to your organizations and earning the respect of your peers!

Tuesday, March 10, 2015

SQL Performance Basics: Ask For Only What You Need

As you write SQL statements to access DB2 data, there are certain very simple, yet important rules to follow to encourage efficient SQL. Of course, SQL performance is a complex topic and to understand every nuance of how SQL performs can take a lifetime to master. That being said, adhering to the following simple rules puts you on the right track to achieving high-performing DB2 applications.

The first rule is to always provide only the exact columns that you need to retrieve in the SELECT-list of each SQL SELECT statement. Sometimes this is shortened to “do not use SELECT *”. The shorthand SELECT * means retrieve all columns from the table(s) being accessed. This is fine for quick and dirty queries but is bad practice for inclusion in application programs because DB2 tables may need to be changed in the future to include additional columns. SELECT * will retrieve those new columns, too, and your program may not be capable of handling the additional data without requiring time-consuming changes. 

Even if the program needs every column, it is better to explicitly ask for each column by name in the SQL statement for clarity and to avoid accessing new and unwanted columns.
DB2 consumes additional resources for every column you specify to be returned. If the program does not need the data, it should not ask for it.

The second part of asking only for what you need is using the WHERE clause to filter data in the SQL instead of bringing it all into your program. This is a common rookie mistake. It is much better for DB2 to filter the data before returning it to your program. This is true because DB2 uses additional I/O and CPU resources to obtain each row of data. The fewer rows passed to your program, the more efficient your SQL will be. So, the following SQL

    SELECT  EMPNO, LASTNAME, SALARY
    FROM    EMP
    WHERE   SALARY > 50000.00;


Is better than simply reading all of the data without the WHERE clause and then checking each row to see if the SALARY is greater than 50000.00 in your program.