0

I'm in the middle of setting up a table, and I want to look at the count of how many 0 values each of the columns has. I've never used system views before... can I do something like this? (this doesn't work as entered, of course, it's just to convey intent):

SELECT t.COLUMN_NAME, (SELECT COUNT(ID) FROM tblKeyStatistics t2 WHERE 
t2.ColumnName = t.COLUMN_NAME AND t2.ColumnName = 0) AS CountOf0
FROM INFORMATION_SCHEMA.COLUMNS t
WHERE TABLE_NAME = 'tblKeyStatistics'

So output would look like:

EntValue    2
TrailingPE  1
ForwardPE   11

With each of the integers being the count of 0 values in that column in the table.

Thanks in advance...

Aaron Bertrand
  • 272,866
  • 37
  • 466
  • 490
StatsViaCsh
  • 2,600
  • 10
  • 44
  • 63

1 Answers1

3

No, the system views / DMVs do not track values or null/not null status for individual columns, unless there are stored in some undocumented location (which I doubt). To get this information, you'll need to run the queries against the actual objects. You can generate the queries in a more automated way, of course:

DECLARE @table sysname = N'dbo.tblKeyStatistics'; -- please use schema prefix

DECLARE @sql nvarchar(max) = N'';

SELECT @sql += CHAR(13) + CHAR(10) 
  + QUOTENAME(name) + N' =  SUM(CASE WHEN ' 
  + QUOTENAME(name) + N' = 0 THEN 1 ELSE 0 END),'
  FROM sys.columns
  WHERE [object_id] = OBJECT_ID(@table)
  -- make sure you only get numeric types:
  AND system_type_id IN (48, 52, 56, 59, 60, 62, 104, 106, 108, 122, 127);

SELECT @sql = N'SELECT ' + @sql + N' TotalRows = COUNT(*) 
  FROM ' + @table + N';';

PRINT @sql;
--EXEC sys.sp_executesql @sql;

This looks like it will result in a very expensive query, but in reality it just does a clustered index scan (much like SELECT * would do, with a little added math).

Also please stay away from the INFORMATION_SCHEMA views. They are not healthy:

Aaron Bertrand
  • 272,866
  • 37
  • 466
  • 490
  • Thanks for the link to your article. You've made it very clear in it that INFORMATION_SCHEMA views are incomplete, and I agree with you entirely on that. I did suspect as much, even though I consider myself relatively new to INFORMATION_SCHEMA. Having said all that I must add that I can see nothing unhealthy in checking for a column's (non-)existence like this: `IF [NOT] EXIST (SELECT * FROM INFORMATION_SCHEMA WHERE TABLE_NAME = 'x' AND COLUMN_NAME = 'y') …`. Seems to me more natural than using `sys.columns`. (And I was quite used to the latter method before learning about INFORMATION_SCHEMA.) – Andriy M May 01 '12 at 22:28
  • Oops, missed `.COLUMNS`, sorry. It should have read `FROM INFORMATION_SCHEMA.COLUMNS`, of course. – Andriy M May 01 '12 at 22:35
  • Why do you find it "more natural"? Because the table name isn't stored there? That is quite easy to derive especially if you choose to continue to ignore schema, using `OBJECT_NAME([object_id])`. Since `INFORMATION_SCHEMA` views will continue to be neglected as new features are added to SQL Server, I find very little validity in excuses to continue using them. – Aaron Bertrand May 01 '12 at 22:38
  • The INFORMATION_SCHEMA way of querying schema still seems more intuitive (although I'd agree that *that* might be subjective). – Andriy M May 01 '12 at 22:57
  • Why are they "intuitive"? What are your criteria for "intuitive"? Because they store the table name instead of fussing with this object_id? Why is it intuitive to use these views when you can, but are forced to use the catalog views when you can't (e.g. indexes)? – Aaron Bertrand May 01 '12 at 22:59
  • Hard to explain (which may prove that my statement was indeed subjective), but using names instead of IDs seems a plausible reason indeed. A (more or less) happy choice of column names might be another one. Basically, I'm just asking myself: if I was new to databases, which method of querying schema would I be able to learn sooner, using "native" catalogs or using INFORMATION_SCHEMA? I really doubt if it would be the former, even if I don't find the sys.* catalogs all that hard to sort out. – Andriy M May 01 '12 at 23:15
  • But does that only help you because you have used INFORMATION_SCHEMA before? The way the system catalogs work the same way any properly normalized database does - you store the key once. Why should I repeat the table name in every single row for a table that has 80 columns? Don't you find that wasteful? – Aaron Bertrand May 01 '12 at 23:19
  • *But does that only help you because you have used INFORMATION_SCHEMA before?* – sorry, you've lost me there. Re "wasteful" – INFORMATION_SCHEMA is an ISO standard for metadata representation, and using text-only data approach could well be kind of middle ground among various suggestions involving numeric and/or other kind of IDs. However it may be, in my opinion, operating names directly makes INFORMATION_SCHEMA more straightforward, which in my book is one of the basic characteristics of intuitiveness. – Andriy M May 02 '12 at 01:00
  • I'm suggesting that part of the reason you find INFORMATION_SCHEMA more intuitive is that you're already familiar with them. I understand that they're ANSI standard, but in SQL Server there is more metadata than the standard covers. I know you know this because you read my article. Since in some cases I can only get the metadata information that I need from the sys catalog views, I don't understand the desire to use INFORMATION_SCHEMA when my choice is one or the other. I highly value consistency over the standard if I don't have a hope in adhering to the standard anyway. – Aaron Bertrand May 02 '12 at 01:03
  • Ah, so it is the inconsistency that is unhealthy, not the INFORMATION_SCHEMA views. Phew! That was close. You see, when you said that thing about the views not being healthy, you made it appear to me as if you considered them (or knowing them, using them) positively harmful. And that was my main point to argue against. Re "familiar with INFORMATION_SCHEMA" – yes, but I learned about system catalogs much earlier, so I'm even more familiar with *them*, and that didn't prevent my considering INFORMATION_SCHEMA more intuitive. – Andriy M May 02 '12 at 01:42
  • By not healthy I meant a few things - incomplete, inconsistent, both of which I've described already, and even Microsoft warns against their use in some of the more common cases (for example see http://msdn.microsoft.com/en-us/library/ms186224.aspx). In cases where they can't be trusted, then the implication there is, yes, they can be harmful. – Aaron Bertrand May 02 '12 at 01:46
  • Yes, reliability (and lack thereof) is a much more serious issue than e.g. intuitiveness, there's no argue about it. But after their response to your question where [you raised the issue](http://connect.microsoft.com/SQLServer/feedback/details/686118/) of INFORMATION_SCHEMA's unreliability implied by the particular phrase you were just referring me to, it may only appear a matter of poor wording in their manual(s), and so incompleteness may be the only real issue about INFORMATION_SCHEMA (inconsistency being merely a product of one's reluctance to trade straightforwardness for completeness). – Andriy M May 02 '12 at 08:01