As many of you know, SQL Server 2005 introduced the ability to partition data at the table level. No longer did we have to create multiple tables and tie them back together through a partition view and check constraints on a partitioning column. In SQL Server 2005, you can create a partitioning function and partition schema and then create a table with multiple partitions. The table can be segmented into as many as 1,000 partitions but create a single table name that can be used for your queries.
Partitioned tables can offer many advantages in both performance and manageability as the table can participate in joins on partitions with other tables, have the ability to switch a large amount of data to another table without putting pressure on the transaction log, or by allowing each partition to be placed on its own drive array. Another advantage that many overlook is the ability to adjust your indexing strategy slightly to take advantage of partitions.
By default, indexes are partitioned exactly like the table. If you do not specify different partitioning functions or schema, the index will simply use the one that you created for the table. This will cause the index to be partitioned exactly like the table. You can create the indexes with different partitioning functions or schemas that are exactly alike the ones you used for the table if you wished. Either way, you are creating what is called an aligned index. Aligned indexes have individual BTrees for each table partition and the data in the individual BTrees will be aligned to the data in the table partitions.
Note: There are some restrictions you have to follow to get the indexing partitioning to work. Unique indexes (clustered or nonclustered) must have the partitioning column as part of the index key. A clustered index must contain the partitioning key. SQL Server will add the partitioning column to the clustered index key if the clustered index is created as non-unique. Nonclustered, non-unique indexes will have the partitioning column added as an INCLUDED column if it is not part of the index key.
Aligned indexes are usually the way most people will create their indexes on partitioned tables as one of the key benefits to partitioning is the ability to move (switch) a partition to another table and in order to switch partitions all the indexes on the table must be aligned. Switching a partition to another table is a very quick way to move millions of rows of data to another table in order to drop the millions of rows with a simple DROP TABLE command. This will allow you to clean out tables without having to take the hit of a DELETE statement against the main table.
The code to create a partition table and aligned indexes is actually very simple although you have to create the table on the partitioning schema which means that existing tables have to be renamed, recreated on the partitioning schema with the partitioning function, and then all the data moved back into the new partitioned table with an INSERT command. Be nice if they would give us the ability to simply ALTER existing tables when we want to start partitioning them beyond the single partition all SQL Server 2005 tables are created with.
The following code will partition a table into months. First we have to create a function that will be the basis on how we are going to partition our data.
–Create main partition function for date
CREATE PARTITION FUNCTION [pfDateRange] (DATETIME)
AS RANGE RIGHT FOR VALUES ( ‘20060301’, ‘20060401’, ‘20060501’, ‘20060601’, ‘20060701’, ‘20060801’, ‘20060901’, ‘20061001’, ‘20061101’, ‘20061201’, ‘20070101’, ‘20070201’, ‘20070301’);
GO
This particular function will partition our data into 14 different partitions. You may only see 13 named months in the functions, but we have to handle some unnamed months as well.
Partitioned functions are created either as LEFT or RIGHT functions. What this means is how the functions will handle data that falls into ranges outside the ones we explicitly state. What do we do with data that has a date before 3/1/2006 or with data that is after 3/32/2007? When we create a partition function, SQL Server actually creates a “hidden” partition. This hidden partitioned, it is not really hidden at all as you can easily see it through the catalog views and DMVs, handles data that falls outside of our named range on one side of the data. What does this mean? When we create LEFT partitions, it means that the hidden partition is created to the right of our named range. In our case, all dates that are greater than 3/31/2007 will be placed into the hidden partitioned. Dates that happen before 3/1/2006 will be placed into the same partition as 3/2006 data. With a RIGHT function, the one we have, the hidden partition is created on the left and data with dates before 3/1/2006 will be placed in the hidden partition while data with dates after 3/31/2007 will be placed into the same partition as 3/2007 data.
Once we figure out where our data will be placed, we need to create a partitioning schema to specify where each partition will reside on our disk subsystem. This is accomplished with the use of filegroups for each partition. The partitioning schema is flexible in allowing you to use a single filegroup for all the partitions, although you will lose some performance benefits of partitioning if you do this, or you can specify a different filegroup for each partition. You can even mix and match filegroups as much as you want as long as each partition named in the function, including the hidden one, has a filegroup to be placed on.
–Create partition scheme for date
–Each partition will go onto the same filegroup
CREATE PARTITION SCHEME psDateRange
AS PARTITION [pfDateRange]
TO ([PRIMARY], [PRIMARY], [PRIMARY], [PRIMARY],[PRIMARY],[PRIMARY],[PRIMARY],[PRIMARY],[PRIMARY],[PRIMARY],[PRIMARY],[PRIMARY],[PRIMARY],[PRIMARY]);
GO
Now all we have to do is create the table and put data into it.
–Create table on main DateRange partition
CREATE TABLE partitionDate
(colDate DATETIME
,colSession INT
,colDesc VARCHAR(100)
) ON psDateRange(colDate)
GO
–Insert data into table
DECLARE @loop INT
SET @loop = 5000
WHILE @loop > 0
BEGIN
INSERT INTO partitionDate
VALUES (‘20060301’,1,‘20060301-1’ )
INSERT INTO partitionDate
VALUES (‘20060401’,1,‘20060401-1’ )
INSERT INTO partitionDate
VALUES (‘20060501’,1,‘20060501-1’ )
INSERT INTO partitionDate
VALUES (‘20060601’,1,‘20060601-1’ )
INSERT INTO partitionDate
VALUES (‘20060701’,1,‘20060701-1’ )
INSERT INTO partitionDate
VALUES (‘20060801’,1,‘20060801-1’ )
INSERT INTO partitionDate
VALUES (‘20060901’,1,‘20060901-1’ )
INSERT INTO partitionDate
VALUES (‘20061001’,1,‘20061001-1’ )
INSERT INTO partitionDate
VALUES (‘20061101’,1,‘20061101-1’ )
INSERT INTO partitionDate
VALUES (‘20061201’,1,‘20061201-1’ )
INSERT INTO partitionDate
VALUES (‘20070101’,1,‘20070101-1’ )
INSERT INTO partitionDate
VALUES (‘20070201’,1,‘20070201-1’ )
INSERT INTO partitionDate
VALUES (‘20070301’,1,‘20070301-1’ )
INSERT INTO partitionDate
VALUES (‘20060301’,2,‘20060301-2’ )
INSERT INTO partitionDate
VALUES (‘20060401’,2,‘20060401-2’ )
INSERT INTO partitionDate
VALUES (‘20060501’,2,‘20060501-2’ )
INSERT INTO partitionDate
VALUES (‘20060601’,2,‘20060601-2’ )
INSERT INTO partitionDate
VALUES (‘20060701’,2,‘20060701-2’ )
INSERT INTO partitionDate
VALUES (‘20060801’,2,‘20060801-2’ )
INSERT INTO partitionDate
VALUES (‘20060901’,2,‘20060901-2’ )
INSERT INTO partitionDate
VALUES (‘20061001’,2,‘20061001-2’ )
INSERT INTO partitionDate
VALUES (‘20061101’,2,‘20061101-2’ )
INSERT INTO partitionDate
VALUES (‘20061201’,2,‘20061201-2’ )
INSERT INTO partitionDate
VALUES (‘20070101’,2,‘20070101-2’ )
INSERT INTO partitionDate
VALUES (‘20070201’,2,‘20070201-2’ )
INSERT INTO partitionDate
VALUES (‘20070301’,2,‘20070301-2’ )
INSERT INTO partitionDate
VALUES (‘20060301’,3,‘20060301-3’ )
INSERT INTO partitionDate
VALUES (‘20060401’,3,‘20060401-3’ )
INSERT INTO partitionDate
VALUES (‘20060501’,3,‘20060501-3’ )
INSERT INTO partitionDate
VALUES (‘20060601’,3,‘20060601-3’ )
INSERT INTO partitionDate
VALUES (‘20060701’,3,‘20060701-3’ )
INSERT INTO partitionDate
VALUES (‘20060801’,3,‘20060801-3’ )
INSERT INTO partitionDate
VALUES (‘20060901’,3,‘20060901-3’ )
INSERT INTO partitionDate
VALUES (‘20061001’,3,‘20061001-3’ )
INSERT INTO partitionDate
VALUES (‘20061101’,3,‘20061101-3’ )
INSERT INTO partitionDate
VALUES (‘20061201’,3,‘20061201-3’ )
INSERT INTO partitionDate
VALUES (‘20070101’,3,‘20070101-3’ )
INSERT INTO partitionDate
VALUES (‘20070201’,3,‘20070201-3’ )
INSERT INTO partitionDate
VALUES (‘20070301’,3,‘20070301-3’ )
SET @loop = @loop – 1
END
At this point, we can create our indexes.
–Create clustered index on partitionDate
CREATE CLUSTERED INDEX cl_partitionDate_colDate ON partitionDate(colDate)
GO
–Create nonclustered index on partitionDate
CREATE NONCLUSTERED INDEX ncl_partitionDate_colSession_1 ON partitionDate(colSession) INCLUDE (colDesc)
GO
Now that we have data and indexes, let’s look at how the indexes have been created by looking at the partitioning catalog views and DMVs.
–View partition information
SELECT partition_id
,OBJECT_NAME(object_id) AS ‘Function name’
,index_id
,partition_number –1 is the hidden partition for RIGHT partitions
,hobt_id –Heap or BTree ID (same as the partition_id for aligned indexes)
,rows –Number of rows in each partition
FROM sys.partitions
WHERE [object_id] = (object_id(‘partitionDate’))
–Partition Information
–Some of this is redundant with the above catalog view
SELECT ‘Partition Information’
SELECT * FROM sys.dm_db_partition_stats
WHERE [object_id] = (object_id(‘partitionDate’))
–Let’s look at some different information
SELECT pf.[name]
, pf.[type] –What type of partitioning function
, pf.type_desc –What type of partitioning function
, pf.fanout –How many partitions do we have
, pf.boundary_value_on_right –1 for RIGHT, 0 for LEFT
, pf.create_date
, pf.modify_date
, rv.boundary_id
, rv.[value]
, t.[name]
, s.[name]
, pp.max_length
, pp.[precision]
, pp.scale
, pp.collation_name
, t.is_nullable
, t.is_user_defined
, t.is_assembly_type
FROM sys.partition_functions pf
INNER JOIN sys.partition_parameters pp
ON pf.function_id = pp.function_id
INNER JOIN sys.types t
ON pp.system_type_id = t.system_type_id
INNER JOIN sys.schemas s
ON t.schema_id = s.schema_id
INNER JOIN sys.partition_range_values rv
ON pp.function_id = rv.function_id
AND pp.parameter_id = rv.parameter_id
WHERE pf.name = ‘pfDateRange’
ORDER BY pf.[name], rv.boundary_id
Now that we have looked at some of the basic information about our table and partitions, play with the above code to see how you can partition the table in different ways. Try looking at the columns of the indexes we created, use the index system catalog views to look at the columns of the nonclustered index.
In the next blog post, I will take this one step further and talk about non-aligned indexes and why we may what to create them on our tables.