Thursday, July 16, 2015

Influencing the DB2 Optimizer: Part 3 - Tweaking SQL Statements

In Part 2 of this series we took a look at standard methods of influencing the DB2 optimizer, essentially boiling down to running RUNSTATS, reorganizing, encouraging parallelism and indexing.

But sometimes you may not want to change the DB2 Catalog statistics (or use hints, which we will cover later in this series). In such situations, you might consider tweaking the offending SQL statement. Tweaking is the process of changing a statement in a non-intuitive fashion, without altering its functionality. Let's take a look at some of these methods...

At times, you may need to disable a specific index from being considered by the optimizer. One method of achieving this is to append OR 0 = 1 to the predicate. For example, consider a query against the EMP table on which two indexes exist: one on EMPNO and one on WORKDEPT. Appending OR 0 = 1 (as shown next) to the WORKDEPT predicate will cause DB2 to avoid using an index on WORKDEPT.

    SELECT  EMPNO, WORKDEPT, EDLEVEL, SALARY
    FROM    EMP
    WHERE   EMPNO BETWEEN ‘000020’ AND ‘000350’
    AND     (WORKDEPT > ‘A01’ OR 0 = 1);


The OR 0 = 1 clause does not change the results of the query, but it can change the access path chosen. 

There are other tweaks that can be used to dissuade the DB2 optimizer from choosing a specific index, such as multiplying or dividing by 1, adding or subtracting 0, or appending an empty string to a character column. 


Another method of tweaking SQL to influence DB2’s access path selection is to code redundant predicates. When DB2 calculates the filter factor for a SQL statement, it multiplies the filter factors for all predicates connected with AND.

You can lower the filter factor of a query by adding redundant predicates. For example, consider the following statement:

    SELECT LASTNAME
    FROM   DSN81010.EMP
    WHERE  WORKDEPT = :VAR;

You can attempt to lower the filter factor for the predicate by adding redundant predicates, for example:

    SELECT LASTNAME
    FROM   DSN81010.EMP
    WHERE  WORKDEPT = :VAR
    AND    WORKDEPT = :VAR
    AND    WORKDEPT = :VAR;

The two predicates added to the end are redundant and do not affect SQL statement functionally. However, DB2 calculates a lower filter factor, which increases the possibility that an index on the WORKDEPT column will be chosen. The lower filter factor also increases the possibility that the table will be chosen as the outer table, if the redundant predicates are used for a join.

When redundant predicates are added to enhance performance, be sure to document the reasons for the extra predicates. Failure to do so may cause a maintenance programmer to assume that the redundant predicates are an error and thus remove them. 

Another option for getting a small amount of performance out of an SQL statement is to change the physical order of the predicates in your SQL code. DB2 evaluates predicates first by predicate type, then according to the order in which it encounters the predicates. The four types of SQL predicates are listed in the order that DB2 processes them:
  • Equality, in which a column is tested for equivalence to another column, a variable, or a literal
  • Ranges, in which a column is tested against a range of values (for example, greater than, less than, or BETWEEN)
  • IN, where a column is tested for equivalence against a list of values
  • Stage 2 predicates

Understanding this hierarchy, you can place the most restrictive predicates at the beginning of your predicate list. For example, consider the following query:

    SELECT   LASTNAME
    FROM     DSN8810.EMP
    WHERE    WORKDEPT = ‘A00’
    AND      SEX = ‘M’;

The first predicate has a lower filter factor than the second because there are fewer workers in department A00 than there are males in the entire company. This probably will not increase performance by much, but it can shave a little off a query’s processing time.

Before deciding to tweak SQL statements to achieve different access paths, remember that you are changing SQL code in a non-intuitive fashion. For each modification you make to increase performance, document the reasons in the program, the data dictionary or repository (if one exists), and the system documentation. Otherwise, the tweaked SQL could be maintained after it is no longer required, or modified away when it still is required for performance.

Also remember that the changes could enhance performance for one release of DB2 but result in no gain or decreased efficiency in subsequent releases. Re-examine your SQL for each new version and release of DB2.

No comments: