This is the fifth part of the solutions for interval queries in SQL Server. I know I wrote in the first one that there would be four solutions; however, I tested another one, and I think it is worth mentioning. However, if you don’t want to lose your time, please stop reading now; this blog post is showing a very inefficient solution, and you should take it as a warning what you should never do. For an introduction, please refer to the blog post Interval Queries in SQL Server Part 1. You can find the second part of the solutions in the blog post Interval Queries in SQL Server Part 2, the third part in the blog post Interval Queries in SQL Server Part 3, and the fourth part in the blog post Interval Queries in SQL Server Part 4 .

Note that you also need to read an excellent article by Itzik Ben-Gan wrote on interval queries in SQL Server by using the Relational Interval Tree model. I am using the tables and data Itzik has prepared. In order to test the solutions, you can download the code from Itzik’s article by using the link in this paragraph.I got another idea how to get the lower and upper columns in the same index by using the existing SQL Server infrastructure. I stored the interval in an XML data type column, and then used XML indexes. The idea is that XML indexes include all of the XML elements and attributes, connected to the primary key of a table. I store a very simple XML, with a single root element intervalXML with two attributes, lower and upper. Then I create the PRIMARY XML index, and on the top of it the PROPERTY secondary XML index. This index should be the optimal for the query I want to execute. According to Books Online: “If your workload retrieves multiple values from individual XML instances by using path expressions, clustering paths within each XML instance in the PROPERTY index may be helpful. This scenario typically occurs in a property bag scenario when properties of an object are fetched and its primary key value is known.” Here is the code for creating the table and indexes and populating it:CREATE
TABLE
dbo.IntervalsXML
(


id
INT
NOT
NULL,

IntervalXML
XML
NOT
NULL
);

ALTER
TABLE
dbo.IntervalsXML
ADD
CONSTRAINT
PK_IntervalsXML
PRIMARY
KEY(id);
CREATE
PRIMARY
XML
INDEX PXML_IntervalsXML

ON

dbo.IntervalsXML
(IntervalXML);
CREATE
XML
INDEX IXML_IntervalsXML_Property



ON dbo.IntervalsXML
(IntervalXML)

USING
XML
INDEX
PXML_IntervalsXML
FOR
PROPERTY;

INSERT
INTO
dbo.IntervalsXML
WITH(TABLOCK)
(id, IntervalXML)
    SELECT
id,
     CAST(N’(lower
AS
NVARCHAR(10))
+
         N'” upper=”‘ +
CAST(upper
AS
NVARCHAR(10))
+
N'” />’ AS

XML)
    FROM
dbo.Stage;
GO

 

This part took me more than 18 minutes; I was already a bit concerned about the XML solution at this point. Time to show the query and its performance!

— query
SET
STATISTICS
IO
ON;
SET

STATISTICS
TIME
ON;
GO

— middle of data
— !!! INCREDIBLY INEFFICIENT !!!
DECLARE
@l
AS
INT
= 5000000,
@u
AS
INT
= 5000020;

SELECT
id

FROM
dbo.IntervalsXML
WHERE
IntervalXML.value(‘(/intervalXML/@lower)[1]’, ‘INT’)
<=
@u
AND
IntervalXML.value(‘(/intervalXML/@upper)[1]’, ‘INT’)
>=
@l;
GO
— logical reads: 77,916,198, CPU time: 527,968 ms

 

WOW! I couldn’t believe my eyes! Remember that the original query in Itzik’s article, before any optimization, used a bit more than 11,000 logical reads. With XML and the best possible XML index, the query makes 7,000 times more logical reads! This is really incredible. XML is supported in SQL Server from version 2005, and is still that inefficient?

I contacted my friend Matija Lah, which investigated XML in SQL Server a lot and wrote some fantastic blog posts about it. Check, for example, this one: http://milambda.blogspot.com/2007/01/sql-server-2005-xml-methods-part-two.html. Matija suggested me to change the predicate to use SQL Server variables inside XQuery:

— Different predicate (by Matija Lah)
DECLARE
@l
AS
INT
= 5000000,
@u
AS
INT
= 5000020;

SELECT
id

FROM
dbo.IntervalsXML
WHERE
IntervalXML.exist(‘/intervalXML[@lower <= sql:variable(“@u”)
and @upper >= sql:variable(“@l”)]’)
= 1;
GO
— logical reads: 706,223, CPU time: 101,517 ms

 

This helped a lot, the number of logical IO reads is 100 times lower; yet, it is still 70 times higher than in the original query, before optimization, making the solution still useless. Matija did not give up yet. He suggested me to try with the selective XML indexes, which are available from SQL Server 2012 Service Pack 1. He also provided me the code:

— Trying with a selective XML index (by Matija Lah)
— Drop regular XML indexes
DROP
INDEX
IXML_IntervalsXML_Property

ON
dbo.IntervalsXML;
DROP
INDEX PXML_IntervalsXML


ON dbo.IntervalsXML;
GO
— Enabling selective XML indexes
EXEC
sys.sp_db_selective_xml_index

@dbname
=
‘IntervalsDB’
,@selective_xml_index =

‘on’
GO
— Creating the selective XML indexes

CREATE
SELECTIVE
XML
INDEX SXX_IntervalsXML

ON
dbo.IntervalsXML
(

IntervalXML

)

for (
interval

=
‘/intervalXML’
as
xquery
‘node()’,

interval_lower
=
‘/intervalXML/@lower’
as
xquery
‘xs:double’
singleton,

interval_upper
=
‘/intervalXML/@upper’
as
xquery
‘xs:double’
singleton
);
GO
CREATE
XML
INDEX
FXX_IntervalsXML_interval_lower

ON
dbo.IntervalsXML

(

IntervalXML

)

USING
XML
INDEX
SXX_IntervalsXML

FOR (

interval_lower

);
GO
CREATE
XML
INDEX
FXX_IntervalsXML_interval_upper

ON
dbo.IntervalsXML

(

IntervalXML

)

USING
XML
INDEX
sxx_dbo_IntervalsXML

FOR (

interval_upper

);
GO
— 5 min

The creation of the selective XML indexes took me about 5 minutes, which sounds better than the creation of the basic XML indexes. Time to test the query!

— middle of data
DECLARE
@l
AS
INT
= 5000000,
@u
AS
INT
= 5000020;

SELECT
id

FROM
dbo.IntervalsXML
WHERE
IntervalXML.exist(‘/intervalXML[@lower <= sql:variable(“@u”)
and @upper >= sql:variable(“@l”)]’)
= 1;
GO
— logical reads: 85,079, CPU time: 26,624 ms

 

OK, another improvement for about 8 to 9 times. So the query is only about 8 to 9 times less efficient than the query before optimization.

My conclusion is very simple: STAY AWAY FROM XML IN SQL SERVER!!! Microsoft, when introducing new features, please do not make it half-baked anymore. Either make them good, or don’t make them, and focus more on improving existing features. For example, invest more in Transact-SQL!

Finally, I want to thank Matija Lah for his help.