Saturday, April 11, 2015

SQL Performance Basics: Part 5, Avoid Sorting

Sorting is a very costly operation that you should strive to avoid if at all possible. 
Indexes are very useful for this purpose. DB2 can choose to use an available index 
to satisfy the sorting requirements of an ORDER BY, GROUP BY, or DISTINCT clause. 
Therefore, it can be beneficial to create indexes that correspond to these clauses 
for frequently executed SQL statements. 
For example, consider this SQL statement:
      SELECT    LAST_NAME, TITLE, SALARY
      FROM      EMPLOYEE
      ORDER BY  LAST_NAME, TITLE;
 
If an index on LAST_NAME and TITLE exists, DB2 can use it to avoid sorting. By 
retrieving the data in order from the index, sorting becomes unnecessary.
You can use this information to tune your SQL further to avoid sorting. When 
ordering is required only by one column, sometimes adding another column to 
the ORDER BY can enable indexed access to avoid the sort. 
Consider this SQL statement:
      SELECT    DEPT_CODE, MANAGER, DEPT_NAME,
      FROM      DEPARTMENT
      ORDER BY  MANAGER, DEPT_CODE;
 
Say there is a unique index on MANAGER. DB2 will probably use this index to 
satisfy the request. However, a sort will be done after retrieving the data to 
put the results into DEPT_CODE order within MANAGER. But, since we know 
our data, we are able to change this situation. Because MANAGER is unique 
we know that the following SQL statement is equivalent to the prior one: 
      SELECT    DEPT_CODE, MANAGER, DEPT_NAME,
      FROM      DEPARTMENT
      ORDER BY  MANAGER;
 
In this case, DB2 can use the index on MANAGER to avoid the sort. The extra 
column, DEPT_CODE, is removed from the ORDER BY clause. But, since MANAGER 
is unique, there can be at most one DEPT_CODE per MANAGER. Therefore, the 
sort order will be equivalent. Because we knew our data we removed a sort from 
the process and made our query more efficient!
One final note on sorting: although most of you probably know this, it cannot be 
stated too strongly (or frequently) - always code an ORDER BY if the order of the 
results of your query is important. The ORDER BY clause is the only foolproof 
method of ensuring appropriate sort order of query results. Simply verifying that 
access is via an index is not sufficient to ensure the order of the results because:
  • The access path may change in the future to non-indexed access.
  • Even indexed access does not always return ordered results (e.g. list prefetch).
UNION VERSUS UNION ALL
Another simple, yet effective, means of enhancing SQL performance is to 
understand how UNION, EXCEPT and INTERSECT work. Let's start with UNION 
because it has been around in DB2 the longest. UNION takes the results of 
multiple SELECT statements and combines them together. It does this, as part 
of the UNION operation, by sorting the results and removing any duplicates. 
UNION ALL, on the other hand, will not sort and will not remove duplicates.
If you know that the results of the queries being unioned together are distinct 
(that is, no duplicates will be encountered), then you can use UNION ALL instead 
of UNION, and thereby enhance performance by avoiding the sort. Additionally, 
if you do not care whether duplicates are returned, always use UNION ALL. 
The same applies to the EXCEPT and INTERSECT operations, which were added 
as of DB2 9 for z/OS.

No comments: