I have been playing around with the latest SQL Server 2008 CTP while trying to keep both my performance tuning and DBA classes updated with the latest changes. When I review new releases of SQL Server, I tend to evaluate the release by determining which features and functionalities that will help me solve problems that I could not solve or could not solve easily in earlier releases.
One of the problems that did not have easy solutions in earlier versions was the one faced when dealing with large amounts of data. Many times I have incorporated many different strategies to lessen the impact of large amounts of data on queries by trying to lessen the amount of data contained within the indexes on a table. This often involved partitioning the tables in SQL Server 2005 or archiving data in earlier versions. Any method that worked to create a smaller set of active data to be queried during each SELECT statement and a smaller index to be maintained during data modifications.
One of the great new performance improvements in SQL Server 2008 is the new filtered index. Filtered indexes basically allow you to create a filter on an index – think WHERE clause. The index B-Tree will only contain the rows of data that meet the filtering criteria. This allows you to reduce the amount of data contained in an index which means that you are also reducing the data affected in an index by data modifications.
I am going to walk through a very small and simple demo that illustrates the impact of filtered indexes.
First let’s create a demo table with some demo data
SET NOCOUNT ON
–Create demo table
IF OBJECT_ID(‘TestTable’,‘table’) IS NOT NULL
DROP TABLE TestTable
GO
–Create demo table
CREATE TABLE TestTable
(colID INT IDENTITY(1,1)
,colDate DATETIME
,colSession INT
,colDesc VARCHAR(100)
)
–Insert data into table
DECLARE @loop INT
SET @loop = 1000
WHILE @loop > 0
BEGIN
INSERT INTO TestTable (colDate,colSession,colDesc ) VALUES (‘20070301’,1,‘20070301-1’ )
INSERT INTO TestTable (colDate,colSession,colDesc ) VALUES (‘20070401’,1,‘20070401-1’ )
INSERT INTO TestTable (colDate,colSession,colDesc ) VALUES (‘20070501’,1,‘20070501-1’ )
INSERT INTO TestTable (colDate,colSession,colDesc ) VALUES (‘20070601’,1,‘20070601-1’ )
INSERT INTO TestTable (colDate,colSession,colDesc ) VALUES (‘20070701’,1,‘20070701-1’ )
INSERT INTO TestTable (colDate,colSession,colDesc ) VALUES (‘20070801’,1,‘20070801-1’ )
INSERT INTO TestTable (colDate,colSession,colDesc ) VALUES (‘20070901’,1,‘20070901-1’ )
INSERT INTO TestTable (colDate,colSession,colDesc ) VALUES (‘20071001’,1,‘20071001-1’ )
INSERT INTO TestTable (colDate,colSession,colDesc ) VALUES (‘20071101’,1,‘20071101-1’ )
INSERT INTO TestTable (colDate,colSession,colDesc ) VALUES (‘20071201’,1,‘20071201-1’ )
INSERT INTO TestTable (colDate,colSession,colDesc ) VALUES (‘20080101’,1,‘20080101-1’ )
INSERT INTO TestTable (colDate,colSession,colDesc ) VALUES (‘20080201’,1,‘20080201-1’ )
INSERT INTO TestTable (colDate,colSession,colDesc ) VALUES (‘20080301’,1,‘20080301-1’ )
INSERT INTO TestTable (colDate,colSession,colDesc ) VALUES (‘20070301’,2,‘20070301-2’ )
INSERT INTO TestTable (colDate,colSession,colDesc ) VALUES (‘20070401’,2,‘20070401-2’ )
INSERT INTO TestTable (colDate,colSession,colDesc ) VALUES (‘20070501’,2,‘20070501-2’ )
INSERT INTO TestTable (colDate,colSession,colDesc ) VALUES (‘20070601’,2,‘20070601-2’ )
INSERT INTO TestTable (colDate,colSession,colDesc ) VALUES (‘20070701’,2,‘20070701-2’ )
INSERT INTO TestTable (colDate,colSession,colDesc ) VALUES (‘20070801’,2,‘20070801-2’ )
INSERT INTO TestTable (colDate,colSession,colDesc ) VALUES (‘20070901’,2,‘20070901-2’ )
INSERT INTO TestTable (colDate,colSession,colDesc ) VALUES (‘20071001’,2,‘20071001-2’ )
INSERT INTO TestTable (colDate,colSession,colDesc ) VALUES (‘20071101’,2,‘20071101-2’ )
INSERT INTO TestTable (colDate,colSession,colDesc ) VALUES (‘20071201’,2,‘20071201-2’ )
INSERT INTO TestTable (colDate,colSession,colDesc ) VALUES (‘20080101’,2,‘20080101-2’ )
INSERT INTO TestTable (colDate,colSession,colDesc ) VALUES (‘20080201’,2,‘20080201-2’ )
INSERT INTO TestTable (colDate,colSession,colDesc ) VALUES (‘20080301’,2,‘20080301-2’ )
INSERT INTO TestTable (colDate,colSession,colDesc ) VALUES (‘20070301’,3,‘20070301-3’ )
INSERT INTO TestTable (colDate,colSession,colDesc ) VALUES (‘20070401’,3,‘20070401-3’ )
INSERT INTO TestTable (colDate,colSession,colDesc ) VALUES (‘20070501’,3,‘20070501-3’ )
INSERT INTO TestTable (colDate,colSession,colDesc ) VALUES (‘20070601’,3,‘20070601-3’ )
INSERT INTO TestTable (colDate,colSession,colDesc ) VALUES (‘20070701’,3,‘20070701-3’ )
INSERT INTO TestTable (colDate,colSession,colDesc ) VALUES (‘20070801’,3,‘20070801-3’ )
INSERT INTO TestTable (colDate,colSession,colDesc ) VALUES (‘20070901’,3,‘20070901-3’ )
INSERT INTO TestTable (colDate,colSession,colDesc ) VALUES (‘20071001’,3,‘20071001-3’ )
INSERT INTO TestTable (colDate,colSession,colDesc ) VALUES (‘20071101’,3,‘20071101-3’ )
INSERT INTO TestTable (colDate,colSession,colDesc ) VALUES (‘20071201’,3,‘20071201-3’ )
INSERT INTO TestTable (colDate,colSession,colDesc ) VALUES (‘20080101’,3,‘20080101-3’ )
INSERT INTO TestTable (colDate,colSession,colDesc ) VALUES (‘20080201’,3,‘20080201-3’ )
INSERT INTO TestTable (colDate,colSession,colDesc ) VALUES (‘20080301’,3,‘20080301-3’ )
SET @loop = @loop – 1
END
Next we will create a clustered Index and a non-filtered nonclustered index
–Create clustered index on table
CREATE CLUSTERED INDEX cl_TestTable_1 ON TestTable(colID)
–Create nonclustered index on table
CREATE NONCLUSTERED INDEX ncl_TestTable_1 ON TestTable(colDate) INCLUDE (colSession)
After creating the indexes, let’s take a look at the number of rows in the indexes
–View number of rows in indexes
SELECT name, i.index_id, [rows]
FROM sys.partitions p
INNER JOIN sys.indexes i
ON p.[object_id] = i.[object_id]
AND p.index_id = i.index_id
WHERE p.[object_id] = OBJECT_ID(‘TestTable’)
You should get an output similar to the one below
/*
name index_id rows
cl_TestTable_1 1 39000
ncl_TestTable_1 2 39000
*/
Now let’s create a filtered index and look at the number of rows of data
–Create filtered index
CREATE NONCLUSTERED INDEX ncl_TestTable_2 ON TestTable(colDate) INCLUDE (colSession)
WHERE colDate > ’12/31/2007 23:59:59′
–View number of rows in indexes
SELECT name, i.index_id, [rows]
FROM sys.partitions p
INNER JOIN sys.indexes i
ON p.[object_id] = i.[object_id]
AND p.index_id = i.index_id
WHERE p.[object_id] = OBJECT_ID(‘TestTable’)
You should get an output similar to the one below
/*
name index_id rows
cl_TestTable_1 1 39000
ncl_TestTable_1 2 39000
ncl_TestTable_2 3 9000
*/
Once we are done with creating the infrastructure, let’s look at how they affect performance.
–How does a filter index help SELECTS
–Clear cache for demo
CHECKPOINT
DBCC DROPCLEANBUFFERS
–Get execution plan and IO statistics
SET STATISTICS PROFILE ON
SET STATISTICS IO ON
–Select by date
–This has a bookmark lookup
SELECT * FROM TestTable WITH (INDEX = 2) WHERE colDate > ’12/31/2007 23:59:59′
–Table ‘TestTable’. Scan count 1, logical reads 19692
–TotalSubtreeCost: 2.11023
–Use filtered index
–This has a bookmark lookup
SELECT * FROM TestTable WITH (INDEX = 3) WHERE colDate > ’12/31/2007 23:59:59′
–Table ‘TestTable’. Scan count 1, logical reads 18598
— TotalSubtreeCost: 2.096156
–Select by date
–This does not have a bookmark lookup
SELECT colID, colDate, colSession FROM TestTable WITH (INDEX = 2) WHERE colDate > ’12/31/2007 23:59:59′
–Table ‘TestTable’. Scan count 1, logical reads 27
— TotalSubtreeCost: 0.03095978
–Use filtered index
–This does not have a bookmark lookup
SELECT colID, colDate, colSession FROM TestTable WITH (INDEX = 3) WHERE colDate > ’12/31/2007 23:59:59′
–Table ‘TestTable’. Scan count 1, logical reads 27
— TotalSubtreeCost: 0.0168857
–Turn off execution plans
SET STATISTICS PROFILE OFF
SET STATISTICS IO OFF
–Clear cache for demo
CHECKPOINT
DBCC DROPCLEANBUFFERS
–Insert new rows of data
–Will affect filtered index
INSERT INTO TestTable (colDate,colSession,colDesc ) VALUES (‘20080301’,3,‘20080301-3’)
–View number of rows in indexes
SELECT name, i.index_id, [rows]
FROM sys.partitions p
INNER JOIN sys.indexes i
ON p.[object_id] = i.[object_id]
AND p.index_id = i.index_id
WHERE p.[object_id] = OBJECT_ID(‘TestTable’)
/*
name index_id rows
cl_TestTable_1 1 39001
ncl_TestTable_1 2 39001
ncl_TestTable_2 3 9001
*/
–Determine which pages have been changed
SELECT DB_NAME(bd.database_id) AS ‘Database Name’
,dtab.object_name AS ‘Object Name’
,dtab.index_id AS ‘Index number with dirty page’
,bd.page_id AS ‘Page ID of dirty page’
,bd.page_type AS ‘Type of Page’
,bd.page_level AS ‘Page Level’
FROM sys.dm_os_buffer_descriptors AS bd
INNER JOIN
(
SELECT OBJECT_NAME([object_id]) AS object_name
,index_id ,allocation_unit_id
FROM sys.allocation_units AS au
INNER JOIN sys.partitions AS p
ON au.container_id = p.hobt_id
AND (au.type = 1 OR au.type = 3)
UNION ALL
SELECT OBJECT_NAME([object_id]) AS object_name
,index_id, allocation_unit_id
FROM sys.allocation_units AS au
INNER JOIN sys.partitions AS p
ON au.container_id = p.partition_id
AND au.type = 2
) AS dtab
ON bd.allocation_unit_id = dtab.allocation_unit_id
WHERE bd.is_modified = 1
AND bd.database_id = DB_ID()
AND [object_name] = ‘TestTable’
ORDER BY bd.database_id, [Object Name],index_id, page_type
/*
Database Name |
Object Name |
Index number with dirty page |
Page ID of dirty page |
Type of Page |
Page Level |
AdventureWorks2005 |
TestTable |
1 |
19578 |
DATA_PAGE |
0 |
AdventureWorks2005 |
TestTable |
2 |
20033 |
INDEX_PAGE |
0 |
AdventureWorks2005 |
TestTable |
3 |
20120 |
INDEX_PAGE |
0 |
*/
–Clear cache for demo
CHECKPOINT
DBCC DROPCLEANBUFFERS
–Will not affect filtered index
INSERT INTO TestTable (colDate,colSession,colDesc ) VALUES (‘20071101’,3,‘20071101-3’)
–View number of rows in indexes
SELECT name, i.index_id, [rows]
FROM sys.partitions p
INNER JOIN sys.indexes i
ON p.[object_id] = i.[object_id]
AND p.index_id = i.index_id
WHERE p.[object_id] = OBJECT_ID(‘TestTable’)
/*
name index_id rows
cl_TestTable_1 1 39002
ncl_TestTable_1 2 39002
ncl_TestTable_2 3 9001
*/
–Determine which pages have been changed
SELECT DB_NAME(bd.database_id) AS ‘Database Name’
,dtab.object_name AS ‘Object Name’
,dtab.index_id AS ‘Index number with dirty page’
,bd.page_id AS ‘Page ID of dirty page’
,bd.page_type AS ‘Type of Page’
,bd.page_level AS ‘Page Level’
FROM sys.dm_os_buffer_descriptors AS bd
INNER JOIN
(
SELECT OBJECT_NAME([object_id]) AS object_name
,index_id ,allocation_unit_id
FROM sys.allocation_units AS au
INNER JOIN sys.partitions AS p
ON au.container_id = p.hobt_id
AND (au.type = 1 OR au.type = 3)
UNION ALL
SELECT OBJECT_NAME([object_id]) AS object_name
,index_id, allocation_unit_id
FROM sys.allocation_units AS au
INNER JOIN sys.partitions AS p
ON au.container_id = p.partition_id
AND au.type = 2
) AS dtab
ON bd.allocation_unit_id = dtab.allocation_unit_id
WHERE bd.is_modified = 1
AND bd.database_id = DB_ID()
AND [object_name] = ‘TestTable’
ORDER BY bd.database_id, [Object Name],index_id, page_type
/*
Database Name |
Object Name |
Index number with dirty page |
Page ID of dirty page |
Type of Page |
Page Level |
AdventureWorks2005 |
TestTable |
1 |
19578 |
DATA_PAGE |
0 |
AdventureWorks2005 |
TestTable |
2 |
3012 |
IAM_PAGE |
0 |
AdventureWorks2005 |
TestTable |
2 |
19928 |
INDEX_PAGE |
1 |
AdventureWorks2005 |
TestTable |
2 |
3398 |
INDEX_PAGE |
0 |
AdventureWorks2005 |
TestTable |
2 |
20002 |
INDEX_PAGE |
0 |
AdventureWorks2005 |
TestTable |
2 |
20001 |
INDEX_PAGE |
0 |
*/
As you can see, having a filtered index makes a little difference in this small table. The difference you would see will grow as the size of the data grows and as the data distribution changes. Redo the demo but change the filtered index to only include the last month of data instead of the last three months. Redo the demo by adding 1,000,000 rows of data before 12/31/2007. How do these two tests differ from the one I created?