Wednesday, April 01, 2015

SQL Performance Basics: Part 3, Eliminating Predicates?

It is technically possible to learn how to write SQL statements without having an in-depth knowledge of the data. However, the better you know your data, the better your application performance will be. Let's look at a simple example.

By reducing the number of predicates on your SQL statements you may be able to  achieve better performance by:
  1. Reducing BIND (and REBIND) time because fewer options will probably need to be examined by the DB2 Optimizer.
  2. Reducing execution time due to a smaller path length caused by the removal of search criteria from the optimized access path. DB2 will always make sure that it processes each predicate coded for the SQL statement. Removing predicates removes work -- and less work equals less time to process the SQL.
Of course, you have to make sure that you can actually remove predicates without impacting the result set of your query, right? But sometimes - if you know your data - there are cases where you can eliminate predicates.

Consider the following SQL statement:

 SELECT FIRST_NAME, LAST_NAME, GRADE_LEVEL
 FROM   EMPLOYEE
 WHERE  TITLE = 'VP'
 AND    GRADE_LEVEL >= 10;


This statement retrieve all rows for vice presidents who are at a grade level of 10 or above. But, what if we know more about our data? Say, for example, that the starting grade level for vice presidents in our organization is 10. Therefore, it is impossible for anyone with a lower grade level to achieve the title of VP. That makes the second predicate redundant in this case. If we remove this predicate it will not logically change the results, but with less checking of the data required (DB2 won't have to check for GRADE_LEVEL >= 10) performance may be improved.

It is important though that you truly do "know your data." For example, it is not sufficient to merely note that for current rows in the EMPLOYEE table, no vice presidents are at a grade level below 10. This may just be a coincidence. Do not base your knowledge of your data on the current state of the data. You must truly know your business criteria to determine that  a correlation between two columns (such as between GRADE_LEVEL and TITLE) actually exists. And only then should you modify your SQL. Failure to do so can result in incorrect results being returned.

Also, if the predicate was already there and you are removing it, comment out the predicate instead and be sure to document exactly why you are doing so in the code... that way, when somebody else takes a look at it later they'll know what happened and why.

1 comment:

Craig S. Mullins said...

Whenever any of the shortcuts outlined in this post are used, be sure to document that fact. Use comments in the SQL statement to record the reason behind the formulation of the SQL. Failure to do so can cause programmers to undo the change during future enhancements. At the least, confusion may result as to why the statement was formulated that particular way. So, for example:

--
-- RETRIEVE ALL VPS WITH A GRADE LEVEL OF
-- AT LEAST 10.
--
-- THE SECOND PREDICATE IS COMMENTED
-- OUT BECAUSE THE STARTING GRADE LEVEL
-- OF VICE PRESIDENTS AT OUR COMPANY IS 10.
--
SELECT FIRST_NAME, LAST_NAME, GRADE_LEVEL
FROM EMPLOYEE
WHERE TITLE = 'VP'
-- AND GRADE_LEVEL >= 10;