-5

I was using a MIN function to compare character data in a column that allowed nulls, with disastrous results. ;-) Here's a much simplified example that shows the same kind of thing:

  1. Determine the number of rows in sys.indexes:

    select count(*) from sys.indexes;
    
  2. Run this SQL:

    select count(name), substring(cast(min(name) as varbinary),1,1) 
      from sys.indexes;
    

    If count matches #1, stop here. Try a different database (with more objects) instead.

  3. Go to the Messages tab. You should see: Warning: Null value is eliminated by an aggregate or other SET operation.

  4. How would you handle the nulls in the "name" column in order to resolve that Warning? with coalesce?

  5. Replace "name" with "coalesce(name,char(0x7e))" and run the SQL:

    select count(coalesce(name,char(0x7e))), 
      substring(cast(min(coalesce(name,char(0x7e))) as varbinary),1,1) 
      from sys.indexes;
    
  6. Note the result of the MIN function in #5 (0x7e rather than 0x63).

Questions:

A. Is coalesce the proper way to handle the Warning and missing (null) data per #4?

B. Since the result in #6 is not what's expected, what is the correct way to compare character data in SQL Server? Convert everything to varbinary?

[Edited...]

In the discussions below, there was some confusion and discussion about the relationship between the null replacement via COALESCE and the results of the comparisons. The relationship between the two is this: when you select a string (including a single character) as a null replacement placeholder (steps #4 and #5 above), that string must satisfy the expected results of the comparison(s) that are being performed against values of other data in the query. With some collations, finding a suitable string may be more difficult than with other collations.

Rob at TVSeries.com
  • 2,397
  • 1
  • 21
  • 17
  • 2
    Just ignore the warning. It is only there as an information message. You shouldn't cast to `varbinary` to compare character data. – Martin Smith May 23 '13 at 16:07
  • If you ignore the warning, the counts are off. That's what got my attention in the first place, in the original code, which is quite a bit more involved. – Rob at TVSeries.com May 23 '13 at 16:11
  • 5
    `COUNT(col)` only counts `NOT NULL` values. Use `COUNT(*)` to count rows. I don't see how that in any way relates to comparing character data, can you tell us the original issue you were having? – Martin Smith May 23 '13 at 16:12
  • I'm only using the COUNT in the example above to see whether or not there are any rows missing. In the original code the Warning was there and the rows were missing, even though there was no COUNT in that code (remove the COUNT functions from the SQL above to see for yourself). The main issue is that after I added the COALESCE to resolve the Warning and missing rows, the MIN function no longer worked as expected. – Rob at TVSeries.com May 23 '13 at 16:19
  • 2
    So what did it return, what did you expect and why did you decide that converting to `varbinary` would resolve it? – Martin Smith May 23 '13 at 16:22
  • 2
    If you really, really don't want to see the warnings, use `SET ANSI_WARNINGS OFF;`. Can you show an actual example where "rows are missing" or "counts are off"? – Aaron Bertrand May 23 '13 at 16:22
  • BTW, this is SQL Server 2008. If it seems to be working for you, what version of SQL Server are you running that SQL in? What is the collation of that database? – Rob at TVSeries.com May 23 '13 at 16:22
  • I wouldn't want to turn off the warnings. The warning helped me identify the problem with the missing rows. – Rob at TVSeries.com May 23 '13 at 16:23
  • 5
    So again, before you started going down this odd convert to varbinary path, can you demonstrate a case where comparing character data led to "the problem with the missing rows"? – Aaron Bertrand May 23 '13 at 16:24
  • 1
    If you're trying to compare values in a column that allows nulls, what is the point of including rows where the value is NULL? Why not filter them out with `WHERE name IS NOT NULL`? Could you start over and describe exactly what you are trying to accomplish (don't include all this MIN and VARBINARY stuff) and how exactly it is going wrong? This goose chase is just confusing and I think it is overshadowing your actual problem. – Aaron Bertrand May 23 '13 at 16:29
  • Aaron - the missing rows come first, the issue with the comparison comes from resolving that with COALESCE. Here's the demonstration: https://plus.google.com/+RobertSimpson/posts/X5euetps8Eq – Rob at TVSeries.com May 23 '13 at 16:37
  • I still don't understand what you're trying to do or why. Did you see the part where Martin explained that COUNT(name) ignores NULL values? If you replace NULL with something that is NOT NULL, why are you surprised that the counts are different? One more time, with feeling: can you please explain the *ACTUAL PROBLEM* you're having, instead of this sidebar about converting NULL values to varbinary for some bizarre, yet-to-be-explained reason? – Aaron Bertrand May 23 '13 at 16:41
  • 1
    Rob, please explain what you're really trying to do. Your post misses the point that Aaron is trying to make. Why are you converting to varbinary then taking the first character? Why not just take the first character while it's already a string? – tommy_o May 23 '13 at 16:41
  • The fact that the counts change when you replace something that's NULL with something that's NOT NULL is not surprising - in fact, that's what fixes the counts per #4 above (in my demo, 110 is wrong, 116 is correct as shown by the first query from #1). I originally added the varbinary to see why it was returning unexpected results - I was expecting MIN to compare based on the binary values; since it doesn't varbinary should. I agree - ignore the sidebar about the NULL and focus on the original question about comparing data (unless my solution to the NULL problem was incorrect for some reason). – Rob at TVSeries.com May 23 '13 at 16:51
  • 1
    Please edit and clarify the original question and someone can answer. – tommy_o May 23 '13 at 16:59
  • You will see the same results if you do `COALESCE(name, 'x')` so I'm not sure what the varbinary conversion proves. And you say "focus on the original question" - can you highlight that for us? The question title talks about varbinary and so does your entire repro. What led you here? – Aaron Bertrand May 23 '13 at 16:59
  • I give up. You're not making sense. I may come back and re-visit this question when it no longer contains a single reference to `varbinary` and instead focuses on your *actual problem*. But I probably won't. – Aaron Bertrand May 23 '13 at 17:03
  • Me too. My boss originally suggested putting the question online. If someone answers with a simple Yes or No, I'd probably vote for that one. ;-) – Rob at TVSeries.com May 23 '13 at 17:12
  • 3
    If someone answers with a simple yes or no, the question should be deleted. – Aaron Bertrand May 23 '13 at 17:13
  • @AaronBertrand, I did my best to answer all aspects of your question. – tommy_o May 23 '13 at 17:16
  • You _won't_ see the same results with "COALESCE(name, 'x')" _unless_ all of the values in your "name" column happen to be > 'x'. – Rob at TVSeries.com May 23 '13 at 20:08

2 Answers2

2

EDITED AND UNDELETED

Answer to A.: Yes, or you can use ISNULL() with the same result as COALESCE() in this case.

Answer to B.: Do not convert a varchar to a varbinary to compare them, but understand the collation sort order when using aggregates.

I think this code snippet answers the count with NULL problem, but I'm still a little confused about the question:

select count(*) from sys.indexes; 
-- 697 results
go
select count(isnull(name,'')) from sys.indexes; 
-- 697 results
go
select count(name) from sys.indexes; 
-- 567 results
go

And this gets the count of records for the MIN name field (based on collation and SQL sort order of string fields):

select  i.name
        ,subCnt.Cnt
from    (select min(name) as name from sys.indexes) as i
join    (select name, count(*) as Cnt from sys.indexes group by name) as subCnt
on      subCnt.name = i.name;

And this query explains the aggregate sort order and why the above query chooses the value returned in the name field:

select name, row_number() over (order by name) from sys.indexes order by name;

And this query shows my collation's (Latin1_General_BIN) sort order even when replacing NULLs with char(0x7E):

select  coalesce(name,char(0x7e))
        , row_number() over (order by coalesce(name,char(0x7e))) 
from    sys.indexes order by 2;

And this shows the sort order difference between collations in SQL Server (which determines what is MIN or MAX in a string field):

declare @test table (oneChar char(1) collate Latin1_General_BIN
                    , oneChar2 char(1) collate SQL_Latin1_General_CP1_CI_AS
                    , varb varbinary)

insert into @test (oneChar)
select 'c' union all
select '~' union all
select 'P' union all
select 'X' union all
select 'q' union all
select NULL

update @test set varb = cast(isnull(oneChar,char(0x7E)) as varbinary), oneChar2 = oneChar

select min(oneChar) from @test -- 'P'
select min(oneChar2) from @test -- '~'
select min(varb) from @test  -- 0x50, the varbinary equivalent of oneChar

And if you want the count of all rows and you want the MIN() of the name without considering the NULLs (and not seeing a warning, for whatever reason), use this:

select  i1.Cnt
        ,i2.name 
from    (select count(*) as Cnt from sys.indexes) as i1
        ,(select min(name) as name from sys.indexes where name is not null) as i2

And whatever you do, certainly don't cast a whole field as a different collation just to do some filtering. This question belongs in the discussion forums, not as a simple question/answer.

tommy_o
  • 3,640
  • 3
  • 29
  • 33
  • 1
    Answer `A` will change the semantics. `SELECT MIN(ISNULL(name,'')) from (SELECT 'Foo' UNION ALL SELECT NULL) T(name)` returns `''` even though it doesn't exist in the data. – Martin Smith May 23 '13 at 16:04
  • Ehh yeah, you're right. I guess I don't really understand what the OP is attempting to do. – tommy_o May 23 '13 at 16:08
  • Per question "B" in the original post, I was attempting to get MIN to return the expected result, since in the SQL in #6 (as well as other SQL using various comparison operators that I tried), it does not. Try adding the MIN function to your second and third SQL statements in your answer (with or without the substring and/or varbinary, or using other comparison functions/operators). When the counts are right, the comparisons are wrong and vice-versa. – Rob at TVSeries.com May 23 '13 at 17:18
  • 1
    Your query in number 5 is asking for the count of all records in the table (which is does accurately after converting NULLs to non-NULLs) and the first character of all name records after replacing any NULL name records with `~`. If you sort in SQL these records, tilde (0x7e) is first, so your second `substring(min(coalesce([...])` returns `~`. Your query is super weirdly written, but returning _exactly_ the correct result. It has nothing to do with comparing character data. Comparing to what? Your query just sorts it then takes the MIN(). – tommy_o May 23 '13 at 17:23
  • So if that's "exactly the correct result", you're saying that you _expect_ the MIN of 'c' 0x6c and '~' 0x7e to return the greater of the two values? For your "Answer to B", if you don't convert to varbinary, what would you use instead in order to get MIN and other comparisons to return the _lesser_ of the values? The reasons it's weirdly written is because while debugging it, I was trying to figure out what was going on "under the covers" (ie. was it using some strange internal binary values for those characters where '~' _was_ less than 'c'?). Un-weird it -- you still get the same results. – Rob at TVSeries.com May 23 '13 at 17:28
  • Sorry, I wrote without checking my results. Tilde isn't returned before other characters. I do not get 0x7e in the result of #5, I get 0x50 ('P') -- your results don't match mine. Can you post your collation? Mine is Latin1_General_BIN, SQL Server 2005. – tommy_o May 23 '13 at 17:39
  • It's SQL Server 2008 and the collation is SQL_Latin1_General_CP1_CI_AS. But your "If you sort in SQL ..." is a great observation - I hadn't tried "order by" yet. The result (SQL in next comment) is this: 0x7E 0x63 0x69 0x6E 0x71 0x73. So replacing the "comparison" in the title of the question with "order by", the question can be interpreted as "What's the best way to get 'order by' to return the expected results?" – Rob at TVSeries.com May 23 '13 at 17:49
  • select distinct substring(coalesce(name,char(0x7e)),1,1), substring(cast(coalesce(name,char(0x7e)) as varbinary),1,1) from sys.indexes order by 1; – Rob at TVSeries.com May 23 '13 at 17:50
  • It also suggests a fix. I just need to find a non alphanumeric character that sorts higher (using order by) than lower case 'z' (and any other letter or number, since the column values for my original issue were alphanumeric). Putting that in the COALESCE inside the MIN function instead of the '~' 0x7e would fix it. – Rob at TVSeries.com May 23 '13 at 17:53
  • 1
    I updated the answer. The difference is in collation's sort order and I've added a test for it. – tommy_o May 23 '13 at 17:54
  • I thought we had it - the fact that I chose a diacritical mark (tilde) as the non-null character and was using accent-sensitive comparisons. But changing the collation didn't help and replacing 0x7e with 0x7b didn't either. I will have to keep experimenting to find a collation that works. – Rob at TVSeries.com May 23 '13 at 18:04
  • select distinct substring(coalesce(name,char(0x7b)),1,1) collate sql_latin1_general_cp1_ci_ai, substring(cast(coalesce(name,char(0x7b)) as varbinary),1,1) from sys.indexes union select distinct substring(coalesce(name,char(0x7e)),1,1) collate sql_latin1_general_cp1_ci_ai, substring(cast(coalesce(name,char(0x7e)) as varbinary),1,1) from sys.indexes order by 1; – Rob at TVSeries.com May 23 '13 at 18:05
  • I still don't understand why you are even converting the nulls. MIN will just ignore NULLs, so just don't coalesce(). Then you have no problem. You can't compare a NULL to a string since NULL *has no value* -- NULL is not the equivalent of an empty string. – tommy_o May 23 '13 at 18:12
  • If you don't use COALESCE, you get that Warning about nulls and the counts are wrong. The answer was the COLLATION (thanks to your help). See the posted answer. – Rob at TVSeries.com May 23 '13 at 18:14
0

I'm assuming there is a reason you couldn't use ISNULL doing something along the lines of: ISNULL(MyField,'Some String I will know is a null')

p.s. be careful with the performance of this on large datasets in a production environment depending on what you are doing.

TDrudge
  • 747
  • 5
  • 5
  • 1
    This is the same as an answer that has already been deleted because this changes the semantics of the query. Two values that are NULL should not be converted to `some string` because there is no way to assume that those two `NULL` values *should* be equal. This only works in a case where `NULL` and `some string` mean the same thing, which isn't exactly common. – Aaron Bertrand May 23 '13 at 16:31
  • Using ISNULL instead of COALESCE produces the same results: select count(isnull(name,char(0x7e))), substring(cast(min(isnull(name,char(0x7e))) as varbinary),1,1) from sys.indexes; – Rob at TVSeries.com May 23 '13 at 16:40