Suppose that you need to do query tuning in a system that suffers from performance problems. There could be requests from users to tune very specific queries, but there could also be requests to improve the system’s performance as a whole. Then there is the question which queries should you tune? There could be hundreds of different query patterns in the system. It simply won’t be an efficient use of your time to try and tune them all. You need to figure out a smart process that will help you isolate a fairly small number of queries— the ones that, if you tune, will have a bigger impact on the system’s performance than others. This article describes a strategy that will help you prioritize queries for tuning. It also provides a solution for implementing the strategy using enhanced Extended Events in SQL Server 2012.

Sample Data and Queries

For sample data I will use a database called Performance. The database has a table called Orders, with 1,000,000 rows. You can find the source code to create and populate the sample database here: http://tsql.solidq.com/books/source_code/Performance.txt.

Initially, the Orders table has a clustered index using the orderdate column as the key. This index allows queries for orders placed on a certain date or within a range of dates to perform well. To cause such queries to perform badly (for demonstration purposes) drop the clustered index by running the following code:

-- drop clustered index
USE Performance; -- http://tsql.solidq.com/books/source_code/Performance.txt
DROP INDEX idx_cl_od ON dbo.Orders;

Listing 1 has a set of sample queries which represent a typical workload in the target system you are tuning.

Listing 1: Sample Queries

USE Performance;
SELECT orderid, custid, empid, shipperid, orderdate, filler
FROM dbo.Orders
WHERE orderid = 3;
SELECT orderid, custid, empid, shipperid, orderdate, filler
FROM dbo.Orders
WHERE orderid = 5;
SELECT orderid, custid, empid, shipperid, orderdate, filler
FROM dbo.Orders
WHERE orderid = 7;
SELECT orderid, custid, empid, shipperid, orderdate, filler
FROM dbo.Orders
WHERE orderdate = '20080212';
SELECT orderid, custid, empid, shipperid, orderdate, filler
FROM dbo.Orders
WHERE orderdate = '20080118';
SELECT orderid, custid, empid, shipperid, orderdate, filler
FROM dbo.Orders
WHERE orderdate = '20080828';
SELECT orderid, custid, empid, shipperid, orderdate, filler
FROM dbo.Orders
WHERE orderdate >= '20080101'
AND orderdate < '20080201';
SELECT orderid, custid, empid, shipperid, orderdate, filler
FROM dbo.Orders
WHERE orderdate >= '20080401'
AND orderdate < '20080501';
SELECT orderid, custid, empid, shipperid, orderdate, filler
FROM dbo.Orders
WHERE orderdate >= '20080201'
AND orderdate < '20090301';
SELECT orderid, custid, empid, shipperid, orderdate, filler
FROM dbo.Orders
WHERE orderdate >= '20080501'
AND orderdate < '20080601';

After dropping the clustered index, the queries that filter rows by orderdate won’t perform well.

Strategy to Prioritize Queries for Tuning

You can use a tracing tool like SQL Trace or Extended Events to trace the workload against the target system. The problem is – of the many queries in the workload, which ones are more important to tune? You need to figure out a strategy that will help you isolate a fairly small subset of queries, in order to optimize your time.

One strategy is to use a filter in the trace that isolates only those queries that exceed a certain threshold in a performance metric which is of interest. For example, if your goal is to reduce I/O costs in the system because the I/O subsystem is overloaded, you can focus on I/O related metrics like logical reads (and writes if you’re also tuning modifications). Filter only those queries where the number of reads is greater than some value that you have decided on as the threshold. If it’s run-times that you’re trying to reduce, you can focus on a measurement like duration. Filter only those queries that run longer than the value that you have decided on as the threshold.

While there may be some interest in identifying such queries, there is a problem with this strategy. Say you’re currently focusing on I/Os, and specifically reads. Suppose that your trace filters only queries that perform more than 100,000 logical reads. Suppose that there’s some query Q1 that generates 500,000 reads and runs a couple of times a month, and another query Q2 that generates 1,000 reads but runs a million times a month. Which query is more important to tune from the overall system’s perspective? Clearly Q2, but unfortunately the trace will filter out this query because it doesn’t meet the minimum threshold for number of reads per single execution. In short, you need a different, smarter, strategy.

A better strategy is not to filter by a measurement of performance, but rather to capture all the queries in the workload. After collecting the events, you want to aggregate the relevant performance metric for all occurrences of each query, and then prioritize the queries for tuning, based on the aggregated measure. The problem is that this sort of grouping can be tricky to achieve—at least prior to SQL Server 2012. You don’t want to group the events by the original query string, but rather by the query template. Consider the following two queries:

SELECT orderid, custid, empid, shipperid, orderdate, filler
FROM dbo.Orders
WHERE orderdate >= '20080201'
AND orderdate < '20090301';
SELECT orderid, custid, empid, shipperid, orderdate, filler
FROM dbo.Orders
WHERE orderdate >= '20080501'
AND orderdate < '20080601';

The two queries are not identical, but they do share the same template, so you want to consider them as part of the same group. Before SQL Server 2012, whether you used SQL Trace or Extended Events, this was tricky to achieve. You basically had to write your own parser to parse the query strings and replace the input values with a common symbol— thus generating a template form of the query. Then you needed to compute some hash or checksum value, based on the template query form, and group the queries by that value.

Fortunately, SQL Server 2012 improves Extended Events in a way that simplifies the solution significantly. It enhances some of the events, like sql_statement_completed, sp_statement_completed, and others, by supporting an action called query_hash. This action is a 64 bit unsigned integer in the Extended Events system (it will be represented by a BINARY(8) value in SQL Server). It holds a hash value representing the parsed template query form. The query_hash value is same for similar queries (queries based on the same template, but with different input values), and different for queries that aren’t similar. So all of the complexities associated with creating and using your own parser are eliminated. SQL Server does all of this for you by providing the query_hash action.

Implementing the Strategy

Now that you have a strategy in place, as well as support for the query_hash action in Extended Events, it’s time to implement the strategy.

The first step is to create an Extended Events session. If queries are submitted as ad-hoc statements, you need to capture the sql_statement_completed event. If queries are submitted from stored procedures, you need to capture the sp_statement_completed event. You also need to decide if you want to apply any filters. For example, if you need to capture only activity in a specific database, you can add a filter based on the database ID. Also, if you want to discard events that don’t represent queries, you can filter only those events where the query_hash value is different than 0. Here’s the code to create an Extended Events session called query_performance, which captures both events, applies the aforementioned filters, and stores the XML event data in a file target:

-- create Extended Events session
CREATE EVENT SESSION query_performance ON SERVER
ADD EVENT sqlserver.sp_statement_completed(
    ACTION(sqlserver.query_hash)
    WHERE sqlserver.database_id = 6 AND sqlserver.query_hash <> 0),
ADD EVENT sqlserver.sql_statement_completed(
    ACTION(sqlserver.query_hash)
    WHERE sqlserver.database_id = 6 AND sqlserver.query_hash <> 0)
ADD TARGET package0.event_file(SET filename = N'C:TempTracesquery_performance.xel');

Note that in my environment, the Performance database has database ID 6, but you will need to use the relevant database ID in your environment. If you want to specify the database name and let the system figure out the database ID for you, you will need to use dynamic SQL. Also, make sure that you specify a target folder and file name that is suitable for you in your system.

When you’re ready to start the session, run the following code:

-- start session
ALTER EVENT SESSION query_performance
ON SERVER
STATE = START;

To mimic a workload in the target system, execute the set of sample queries from Listing 1 a few times. I executed the code five times.

When you are ready to stop the session, run the following code:

-- stop session
ALTER EVENT SESSION query_performance
ON SERVER
STATE = STOP;

If you’re not planning to run the session again in the future, and wish to drop its definition, use the following code:

-- drop session
DROP EVENT SESSION query_performance ON SERVER;

The traced event data is now stored in the target files named in my system C:TempTracesquery_performance*.xel. To obtain the event data from the files you need to use the table function sys.fn_xe_file_target_read_file. You can convert the input data to XML and then query the XML to extract information. The following code uses this function to extract the event data from the target files, and store it in a temporary table called #Events:

-- copy event data into temp table #Events
SELECT CAST(event_data AS XML) AS event_data_XML
INTO #Events
FROM sys.fn_xe_file_target_read_file('C:TempTracesquery_performance*.xel', null, null, null) AS F;

In order to analyze the event data, you first need to turn it into relational form. You can use the .value method to extract, from the XML representing each event, the various data columns and actions. The following code demonstrates how to achieve this extraction, storing the relational information for the queries in a temporary table called #Queries:

-- extract query perf info temp table #Queries
SELECT
  event_data_XML.value ('(/event/action[@name=''query_hash''    ]/value)[1]', 'BINARY(8)'     ) AS query_hash,
  event_data_XML.value ('(/event/data  [@name=''duration''      ]/value)[1]', 'BIGINT'        ) AS duration,
  event_data_XML.value ('(/event/data  [@name=''cpu_time''      ]/value)[1]', 'BIGINT'        ) AS cpu_time,
  event_data_XML.value ('(/event/data  [@name=''physical_reads'']/value)[1]', 'BIGINT'        ) AS physical_reads,
  event_data_XML.value ('(/event/data  [@name=''logical_reads'' ]/value)[1]', 'BIGINT'        ) AS logical_reads,
  event_data_XML.value ('(/event/data  [@name=''writes''        ]/value)[1]', 'BIGINT'        ) AS writes,
  event_data_XML.value ('(/event/data  [@name=''row_count''     ]/value)[1]', 'BIGINT'        ) AS row_count,
  event_data_XML.value ('(/event/data  [@name=''statement''     ]/value)[1]', 'NVARCHAR(4000)') AS statement
INTO #Queries
FROM #Events;
CREATE CLUSTERED INDEX idx_cl_query_hash ON #Queries(query_hash);
-- examine query info
SELECT * FROM #Queries;

The code also creates a clustered index based on the query_hash value as the key, and then queries the table to show the extracted query info. Table 1 shows an abbreviated form of the output that I got in my system from the last query.

Table 1. Query Performance Info
Table 1. Query Performance Info

Now, in the #Queries table, you have all the information that you need in order to prioritize the queries and isolate the ones that it is most important to tune. You can group the queries by the query_hash value and aggregate performance measures of interest. You can then sort the result, based on one of the metrics – e.g. sum of logical reads, descending – in order to discover which queries it is most important to focus on. You can also use window functions to compute the percent of each query template out of the grand total, as well as the running total percent. Here’s the complete query implementing this logic:

-- group by query hash and compute running total percent
SELECT query_hash,
  COUNT(*) AS num_queries,
  SUM(logical_reads) AS sum_logical_reads,
  CAST(100.0 * SUM(logical_reads)
             / SUM(SUM(logical_reads)) OVER() AS NUMERIC(5, 2)) AS pct,
  CAST(100.0 * SUM(SUM(logical_reads)) OVER(ORDER BY SUM(logical_reads) DESC
                                            ROWS UNBOUNDED PRECEDING)
             / SUM(SUM(logical_reads)) OVER()
       AS NUMERIC(5, 2)) AS running_pct
FROM #Queries
GROUP BY query_hash
ORDER BY sum_logical_reads DESC;

This query applies a windowed SUM aggregate to a grouped SUM aggregate, so you end up with SUM(SUM(…)). If you understand window functions you will know that this is perfectly valid logic. But if the first reaction you have when you see this sort of expression is, “I need to go to the optician to get my eyes examined!” you’d probably be better off handling the computation in two steps, using a table expression such as:

-- simplified
WITH QueryHashTotals AS
(
  SELECT query_hash,
    COUNT(*) AS num_queries,
    SUM(logical_reads) AS sum_logical_reads
  FROM #Queries
  GROUP BY query_hash
)
SELECT query_hash, num_queries, sum_logical_reads,
  CAST(100. * sum_logical_reads
            / SUM(sum_logical_reads) OVER()
       AS NUMERIC(5, 2)) AS pct,
  CAST(100. * SUM(sum_logical_reads) OVER(ORDER BY sum_logical_reads DESC
                                          ROWS UNBOUNDED PRECEDING)
            / SUM(sum_logical_reads) OVER()
       AS NUMERIC(5, 2)) AS running_pct
FROM QueryHashTotals
ORDER BY sum_logical_reads DESC;

Either way, Table 2 shows the output I got for this query on my system.

Table 2. Running Total Percentages by Query Hash
Table 2. Running Total Percentages by Query Hash

My sample workload had only three different query templates, but normally when you run this process in a production environment, you will get many more. And when you do get many more, you will typically want to filter only the queries up to the point where the running total percent reaches a certain threshold for the first time. A good rule of thumb is to set it to 80 percent, because in many systems you will find that a fairly small number of the most expensive queries tend to contribute to around 80 percent of the total. Having done that, you can then focus your tuning efforts on a small number of queries that matter most.

The following code adds the logic to handle this sort of filter, plus uses a subquery to retrieve, for each filtered query template, a sample query from the #Queries table, so that you will know what kinds of queries you need to tune:

-- filter and include a sample query
WITH QueryHashTotals AS
(
  SELECT query_hash,
    COUNT(*) AS num_queries,
    SUM(logical_reads) AS sum_logical_reads
  FROM #Queries
  GROUP BY query_hash
),
RunningTotals AS
(
  SELECT query_hash, num_queries, sum_logical_reads,
    CAST(100. * sum_logical_reads
              / SUM(sum_logical_reads) OVER()
         AS NUMERIC(5, 2)) AS pct,
    CAST(100. * SUM(sum_logical_reads) OVER(ORDER BY sum_logical_reads DESC
                                             ROWS UNBOUNDED PRECEDING)
              / SUM(sum_logical_reads) OVER()
         AS NUMERIC(5, 2)) AS running_pct
  FROM QueryHashTotals
)
SELECT RT.*, (SELECT TOP (1) statement
              FROM #Queries AS Q
              WHERE Q.query_hash = RT.query_hash) AS sample_statement
FROM RunningTotals AS RT
WHERE running_pct - pct < 80.00
ORDER BY sum_logical_reads DESC;

On my system, with my simplistic sample workload, there was one query template that contributed to 80 percent of the entire workload, as shown in table 3. In a more realistic production environment, you will typically get a number of query templates that accumulate to 80 percent.

Table-1.-Query-Performance-Info1-300x277

Analyzing the problematic query and the indexing (or lack thereof) in my system, it is evident that a clustered index on orderdate can significantly improve the performance. Run the following code to create the missing index:

-- create clustered index
CREATE CLUSTERED INDEX idx_cl_od ON dbo.Orders(orderdate);

When you’re done analyzing the event information, you can drop the temporary tables that were used for the analysis by running the following code:

-- cleanup
DROP TABLE #Events, #Queries;
Querying sys.dm_exec_query_stats

Another way to prioritize queries for tuning is to query the sys.dm_exec_query_stats DMV. This view reports performance information for queries that have a plan in cache. Starting with SQL Server 2008, this view provides a query_hash column similar to the one discussed earlier. So you can prioritize queries for tuning by querying this view and related objects, without the need to generate an Extended Events session. As an example, the following code prioritizes queries based on logical reads similar to the principles I described before:

WITH RunningTotals AS
(
  SELECT
    query_hash,
    SUM(execution_count) AS num_queries,
    SUM(total_logical_reads) AS sum_logical_reads,
    CAST(100. * SUM(total_logical_reads)
              / SUM(SUM(total_logical_reads)) OVER()
          AS NUMERIC(5, 2)) AS pct,
    CAST(100. * SUM(SUM(total_logical_reads)) OVER(ORDER BY SUM(total_logical_reads) DESC
                                              ROWS UNBOUNDED PRECEDING)
              / SUM(SUM(total_logical_reads)) OVER()
          AS NUMERIC(5, 2)) AS running_pct
  FROM sys.dm_exec_query_stats AS QS
    CROSS APPLY sys.dm_exec_query_plan(QS.plan_handle) AS QP
  WHERE QS.query_hash <> 0x
    AND QP.dbid = DB_ID('Performance')
  GROUP BY query_hash
)
SELECT RT.*,
  (SELECT TOP (1)
     SUBSTRING(ST.text, (QS.statement_start_offset/2) + 1,
             ((CASE statement_end_offset
                WHEN -1 THEN DATALENGTH(ST.text)
                ELSE QS.statement_end_offset END
                    - QS.statement_start_offset)/2) + 1
           )
   FROM sys.dm_exec_query_stats AS QS
     CROSS APPLY sys.dm_exec_sql_text(QS.sql_handle) AS ST
   WHERE QS.query_hash = RT.query_hash) AS sample_statement
FROM RunningTotals AS RT
WHERE running_pct - pct < 80.00
ORDER BY sum_logical_reads DESC;

The problem with this approach is that the view only has information about queries that have a current plan in cache. You won’t find information about queries that don’t have a plan in cache—for instance, those executed with the RECOMPILE option. In short, the option using Extended Events is still more complete, more precise, and gives you more control, and therefore is usually the preferred option.

Conclusion

When you need to optimize queries in your system, you first need to optimize the process that identifies which queries deserve tuning. Otherwise you will waste effort on queries that will give you little return on the time you invest. Before SQL Server 2012, the process of identifying queries that deserve tuning required you to develop your own parser in order to get a template form of your queries. But in SQL Server 2012, Extended Events already provide a query hash value representing the query template with some events. This significantly simplifies the process of prioritizing queries for tuning. This article has demonstrated how to trace query performance using Extended Events in SQL Server 2012, and how to analyze the event data to prioritize and isolate queries that are important to tune.

Technical Reviewer: Herbert Albert.