From SQL Server 2012, XQuery functions are surrogate-aware. SQL Server Unicode data types, including NCHAR, NVARCHAR, and XML, encode text in UTF-16 format. SQL Server allocates for each character a unique codepoint, a value in the range 0x0000 to 0x10FFFF. Most of the characters fit into a 16-bit word. Characters with codepoint values larger than 0xFFFF require two consecutive 16-bit words, i.e. two bytes. These characters are called supplementary characters, and the two consecutive 16-bit words are called surrogate pairs.

The standard W3C recommendation for XQuery functions and operators requires them to count a surrogate pair as a single character. In SQL Server versions prior to 2012, XQuery string functions did not recognize surrogate pairs as a single character. For example, string length calculations returned incorrect results.

The XML data type in SQL Server only allows well-formed surrogate pairs. However, it is possible to pass invalid or partial surrogate pairs to XQuery functions as string values. Below is an example of providing surrogate pairs to XML data type variable through string values. The value of the CustomerName element is “