0

I have joined a new project which is quite complex. We are using SQL Server 2012 and has many databases(15+). Each table in each database has many columns(more than 25 - 50).
There are thousands of records in each table. I want to view the data in the best possible manner. However, there are nulls for some columns. It has become quite difficult for me to get an idea on the data exists in each table.

Here is my question for sample data.

Id |Column1 |Column2   | Column3 |Column4 |............|ColumnNth
1  |  sam   |*NULL*    |   2     | *NULL* |............|mike  
2  |  jack  |2014-02-03| *NULL*  |   6    |............|*NULL*  
3  |  Rav   | *NULL*   | *NULL*  |  *NULL*|............|Jim  
4  | *NULL* |2013-09-08|   7     |  922   |............|Rocky  
5  |  Tom   |2014-10-08|   11    |  212   |............|Mary  

...
...
so on...

For example, below is the count of NULLs of all the columns for each rows. [It is not a table in the database. "Count of NULLs" is not a column of the initial table. I have shown this just for easy explanation]

ID| Count of NULLs  
1 | 5  
2 |2  
3 |10  
4 |1  
5 |0  
6 | 1  
7 | 2  
8 | 3  

...
...
and so on..
Now I want to view the data with less number of NULLs in the top. The order of data should be in the below order. I have shown only the order of IDs so that not to type all the data(columns) again.

ID 
5  
4  
6  
2  
7  
8  
1  

...
...
3
...
so on...

Explanation of output:
1. The total number of nulls for all the columns for row(ID = 5) is ZERO. This should be first.
2. The total number of nulls for all the columns for row(ID = 4) and row(ID = 6) is ONE. Rows with ID = 4 and ID = 6 should be followed by ID = 5. ... so on...

In the output, I don't want NULL to be replaced by other value like 0 or 1 or blank value etc. So NULL should be displayed as it is in the output if it exists in the initial table.
I want to view all the columns

Please give me a stored procedure or query or set of queries which is generic. As the number of columns are more than 25-40, the SP or query should work without specifying the column names. If it is Stored procedure, then the Table name would be the input for it.
I assume that it is quite complex to write the SP for this requirement. But what I have asked is a real time situation where a new developer faces when he/she joins a support/maintanence project with complex and huge databases that was already built and contains huge data. Thanks

Sanjay M
  • 5
  • 2
  • I have edited your title. Please see, "[Should questions include “tags” in their titles?](http://meta.stackexchange.com/questions/19190/)", where the consensus is "no, they should not". – John Saunders Nov 02 '14 at 20:04
  • Do you have a query written for a single example? I'd suggest starting with that first using hard-coded column names so that you can form an idea of what a dynamic solution would look like. – Yuck Nov 02 '14 at 20:06
  • Hi Yuck, No, I don't have an example for one table. I tried to get the solution but it would be difficult to write a query or SP(dynamic) for this requirement. I have updated my question. If the solution is a stored procedure, then Table name would be input for it. Thanks. – Sanjay M Nov 02 '14 at 21:24
  • @SanjayM Could you just get real please? "as many databases(15+)." - make that 150+ and you talk "normal" for a non trivial product. My current small project has maybe 30 schemata and each has more than 15 tables. Same to your field count (small) and your row count. THOUSANDS? Heck, 20 years ago a million rows was small. Today "large" is in billions. – TomTom Nov 03 '14 at 12:49

2 Answers2

1

Below is an example you can extend as needed for you actual table.

SELECT 
      Id
    , (SELECT COUNT(*) 
       FROM (VALUES(Column1), (Column2), (Column3), (Column4)) AS AllColumns(ColumnValue) 
       WHERE AllColumns.ColumnValue IS NULL) AS NullCount
FROM dbo.SampleTable
ORDER BY NullCount;

If the source table schema may vary, you'll need to use dynamic SQL:

DECLARE @ColumnList nvarchar(MAX) = 
    (SELECT
    STUFF((SELECT ',(' + name + ')'
    FROM sys.columns
    WHERE
        object_id = OBJECT_ID(N'dbo.SampleTable')
        AND name <> 'Id'
    FOR XML PATH('')),1 ,1, '')
    );

DECLARE @SQL nvarchar(MAX) = N'
    SELECT 
          Id
        , (SELECT COUNT(*) 
           FROM (VALUES' + @ColumnList + ') AS AllColumns(ColumnValue) 
           WHERE AllColumns.ColumnValue IS NULL) AS NullCount
    FROM dbo.SampleTable
    ORDER BY NullCount;';

EXEC sp_executesql @SQL;

EDIT (added explanation and SQL 2005 method):

The column list is generated by querying the sys.columns catalog view for all columns in the specified table, except for Id (you can omit other columns as needed using this same technique). Each included column is enclosed in parenthesis and delimited by a comma to build the string needed for the VALUES clause. The FOR XML clause returns this list a single string and the extraneous leading comma is removed with the STUFF function so that the @ColumnList variable contains the string needed for the VALUES clause. The final SQL statement is built by concatenating the rest of the SELECT statement.

Table-valued row constructors (http://msdn.microsoft.com/en-us/library/dd776382.aspx) allow one to specify one or more rows via a VALUES clause. This is used in the final query to pivot the column list of each source row into rows in order to use the COUNT aggregate function.

The techniques used here (row constructors and initializing variables on the DECLARE statement) were introduced in SQL Server 2008 so this will not work in earlier SQL Server versions. The version below uses a derived table instead of table-valued row constructor, which will work with SQL 2005.

DECLARE @ColumnList nvarchar(MAX) = 
    (SELECT
    STUFF((SELECT ' UNION ALL SELECT ' + name
    FROM sys.columns
    WHERE
        object_id = OBJECT_ID(N'dbo.SampleTable')
        AND name <> 'Id'
    FOR XML PATH('')),1 ,11, '')
    );

DECLARE @SQL nvarchar(MAX) = N'
    SELECT 
          Id
        , (SELECT COUNT(*) 
           FROM (' + @ColumnList + ') AS AllColumns(ColumnValue) 
           WHERE AllColumns.ColumnValue IS NULL) AS NullCount
    FROM dbo.SampleTable
    ORDER BY NullCount;';

EXEC sp_executesql @SQL;
Dan Guzman
  • 43,250
  • 3
  • 46
  • 71
  • Hi Dan, I don't want the column names to be specified. The Stored procedure or query should be dynamic. If it is Stored Procedure, then table name would be input for it. Thanks. – Sanjay M Nov 02 '14 at 21:03
  • I added a dynamic SQL example to my answer. – Dan Guzman Nov 02 '14 at 22:52
  • Thanks Dan. I have tried to execute the queries but throwing below errors. I couldn't resolve these errors. Msg 102, Level 15, State 1, Line 2 Incorrect syntax near '('. Msg 139, Level 15, State 1, Line 0 Cannot assign a default value to a local variable. Msg 137, Level 15, State 2, Line 20 Must declare the scalar variable "@SQL". – Sanjay M Nov 03 '14 at 09:26
  • It would be helpful for me if you can explain the logic. – Sanjay M Nov 03 '14 at 09:28
  • @SanjayM, the script I posted should work on SQL 2008 and later. Are you using SQL 2012 as the tag indicates? I'm edit my answer to be compatible with SQL 2005 by not initializing the local variable in the `DECLARE` statement and add additional explanation. – Dan Guzman Nov 03 '14 at 12:20
  • @SanjayM, the script I posted will only work in SQL 2008 and later for the reasons I specified in by answer. – Dan Guzman Nov 03 '14 at 12:46
  • Thanks for your reply. The SSMS client in my desktop is SQL 2012 but I am connecting to the actual database server in development environment which is SQL 2005. Is there a way I can get the solution for SQL 2005? – Sanjay M Nov 03 '14 at 13:53
  • It's the database server running the SQL that matters here. I added a SQL 2005 example. – Dan Guzman Nov 04 '14 at 01:37
0

There is no pure SQL query that will count the NULL columns in your table, without stating the name of the columns.

If you can use procedures, you can get the column names and dynamically build a query by taking column names from INFORMATION_SCHEMA.COLUMNS and concatenating them like the below query (or other alternatives stated here):

SELECT A.*,
  ( (CASE WHEN Column1 IS NULL THEN 1 ELSE 0 END)
  + (CASE WHEN Column2 IS NULL THEN 1 ELSE 0 END)
  + (CASE WHEN Column3 IS NULL THEN 1 ELSE 0 END)
  .
  .
  .
  + (CASE WHEN ColumnN IS NULL THEN 1 ELSE 0 END)) AS NULL_CNT
FROM MYTABLE A
ORDER BY NULL_CNT DESC
Community
  • 1
  • 1
Yigitalp Ertem
  • 1,901
  • 24
  • 27
  • Hi yalpertm, if it could not be achieved with set of queries, I want the stored procedure which can give me the required result. I assume that it is quite complex to write the SP for this requirement. But what I have asked is a real time situation where a new developer faces when he/she joins a support/maintanence project with complex and huge databases that was already built and contains huge data. – Sanjay M Nov 02 '14 at 20:48
  • I have edited my question. If it is Stored procedure, it is fine for me. Table name will be input for the SP. – Sanjay M Nov 02 '14 at 20:57