SQL Server 中比较末尾带空格的字符串

Posted by 佳佳 on 2018-07-11 1 IT DB SQL Server

在 Sql Server 中,一个 nvarchar 型的字段,如果该字段值的末尾有空格,即使用不带空格的字符串去查询,也能够查询出来。

下面是摘自 SQLServer中比较末尾带有空格的字符串遇到的坑 中的例子:

declare @a nvarchar(50);set @a=N'happycat1988'
declare @b nvarchar(50);set @b=N'happycat1988 '

if(@a = @b)
    select 'True' as 直接等号比较
else
    select 'False' as 直接等号比较

if(@a like @b)
    select 'True' as like比较
else
    select 'False' as like比较

从执行结果可以看出 = 操作符认为这两个字符串是相等的; LIKE 的判断结果是对的,认为这两个字符串不相等。

直接等号比较
------
True

(1 行受影响)

like比较
------
False

(1 行受影响)

在上面的文章中只写了可配合 DATALENGTH 函数实现精确比较,但没有说明原因。

在另一片文章 详解SQL Server 比较带有尾随空格的字符串 中是这样写的:

微软的帮助中曾经提到:**ANSI 标准要求填充字符的字符串比较中使用,以使其长度匹配再进行比较。**进行填充时,char 列用空格填充,binary 列用零填充。
LIKE 谓词表达式的右侧功能具有尾随空格的值时, SQL Server 不会填充到相同的长度在两个值比较发生之前。(上面的方式五,只是用like做个测试)

文章中还补充了 SET ANSI_PADDING,但该值只影响存储时的处理,不会影响比较处理。(可以参考 MSDN SET ANSI_PADDING (Transact-SQL)

另外,在 MSDN 上找到了一个官方回复 INF: How SQL Server Compares Strings with Trailing Spaces

SQL Server follows the ANSI/ISO SQL-92 specification (Section 8.2, , General rules #3) on how to compare strings with spaces. The ANSI standard requires padding for the character strings used in comparisons so that their lengths match before comparing them. The padding directly affects the semantics of WHERE and HAVING clause predicates and other Transact-SQL string comparisons. For example, Transact-SQL considers the strings 'abc' and 'abc ' to be equivalent for most comparison operations.

The only exception to this rule is the LIKE predicate. When the right side of a LIKE predicate expression features a value with a trailing space, SQL Server does not pad the two values to the same length before the comparison occurs. Because the purpose of the LIKE predicate, by definition, is to facilitate pattern searches rather than simple string equality tests, this does not violate the section of the ANSI SQL-92 specification mentioned earlier.

说在 Sql Server 中,字符串比较遵循 ANSI/ISO SQL-92 标准,其字符串比较处理如下:

3) The comparison of two character strings is determined as follows:

a) If the length in characters of X is not equal to the length in characters of Y, then the shorter string is effectively replaced, for the purposes of comparison, with a copy of itself that has been extended to the length of the longer string by concatenation on the right of one or more pad characters, where the pad character is chosen based on CS. If CS has the NO PAD attribute, then the pad character is an implementation-dependent character different from any character in the character set of X and Y that collates less than any string under CS. Otherwise, the pad character is a <space>.

b) The result of the comparison of X and Y is given by the collating sequence CS.

c) Depending on the collating sequence, two strings may compare as equal even if they are of different lengths or contain different sequences of characters. When the operations MAX, MIN, DISTINCT, references to a grouping column, and the UNION, EXCEPT, and INTERSECT operators refer to character strings, the specific value selected by these operations from a set of such equal values is implementation-dependent.

Note: If the coercibility attribute of the comparison is Coercible, then the collating sequence used is the default defined for the character repertoire. See also other Syntax Rules in this Subclause, Subclause 10.4, "<character set specification>", and Subclause 11.28, "<character set definition>".

简言之,一般来说,在两个字符串进行比较时,较短的字符串要在末尾补空格到较长字符串的长度,然后再做比较。

参考

  1. SQLServer中比较末尾带有空格的字符串遇到的坑
  2. 详解SQL Server 比较带有尾随空格的字符串
  3. INF: How SQL Server Compares Strings with Trailing Spaces
  4. SQL-92
  5. SET ANSI_PADDING (Transact-SQL)
  6. nchar 和 nvarchar (Transact-SQL)
  7. =(等于)(Transact-SQL)

Buy me a coffeeBuy me a coffee