1

I have a table in SQL Server which has 50 columns and 200 Million records. Currently, I'm hitting this table 50 times to do count on distinct column values for all columns like below:

INSERT INTO dbo.RPTS_LoadValues
(
    Column_Value,
    Record_Value,
    Record_Ct
)
SELECT 'Transaction_Dt',
       CONVERT(VARCHAR(50), Transaction_Dt),
       COUNT_BIG(1)
FROM dbo.VehicleImport
GROUP BY Transaction_Dt
UNION ALL
SELECT 'Purchase_Dt',
       CONVERT(VARCHAR(50), Purchase_Dt),
       COUNT_BIG(1)
FROM dbo.VehicleImport
GROUP BY Purchase_Dt
UNION ALL
SELECT 'PurchaseType_Cd',
       CONVERT(VARCHAR(50), PurchaseType_Cd),
       COUNT_BIG(1)
FROM dbo.VehicleImport
GROUP BY PurchaseType_Cd;

Above query run for 3 columns which is Transaction_Dt, Purchase_Dt and PurchaseType_Cd there are other 47 columns like Car_Type, Model_Ds, Model_Year etc.

Is there any other way using T-SQL, C# or Python that I hit the table only once and able to count on column's distinct record values for all columns? Please provide useful resources or links that can help.

Jay Desai
  • 821
  • 3
  • 15
  • 42
  • 1
    `COUNT_BIG(1)` is going to return the same value for every column, so why do it 50 times? – Thom A Apr 30 '20 at 21:49
  • 2
    Convert(VARCHAR(50) on 200 millions records. This should be really interesting – Steve Apr 30 '20 at 21:51
  • Shameless self promotion: [Count non-NULL rows and get definition](https://www.sqlservercentral.com/scripts/count-non-null-rows-and-get-definition) – Thom A Apr 30 '20 at 21:58
  • @Larnu, How it will return same value for every columns? Count will be different based on column values. – Jay Desai Apr 30 '20 at 21:59
  • I've also voted to close, considering the OP asks for links, rather than a solution. – Thom A Apr 30 '20 at 21:59
  • @JayDesai you're not asking to count distinct values, therefore you get a row count no matter which column you count. – Dale K Apr 30 '20 at 22:05
  • If you're after distinct values, perhaps my expanded answer [here](https://stackoverflow.com/a/60654652/2029983) is more helpful. Note that perfoming a `DISTINCT` count on every column is going to be a very expensive query. – Thom A Apr 30 '20 at 22:11
  • @Larnu, I need distinct values and their counts as well. i.e. Transaction_Dt - 4/1/20 has 10000 count, 4/2/20 has 30000, 4/3/20 has 50000 count so on. Code you are referring just get distinct value count. – Jay Desai Apr 30 '20 at 22:47

1 Answers1

0

--....for 200 mil?

select colname, colvalue, count(*) as thecounter
from
(
select 
    r.col.value('./@colname[1]', 'nvarchar(128)') as colname,
    r.col.value('./@colvalue[1]', 'nvarchar(max)') as colvalue
from
(
select  
   (
   select
    'object_id' as 'col/@colname', object_id as 'col/@colvalue', '', --column name, column value, ''
    'name' as 'col/@colname', name as 'col/@colvalue', '',
    'column_id' as 'col/@colname', column_id as 'col/@colvalue', '',
    'system_type_id' as 'col/@colname', column_id as 'col/@colvalue', '',
    'user_type_id' as 'col/@colname', user_type_id as 'col/@colvalue', '',
    'collation_name' as 'col/@colname', collation_name as 'col/@colvalue', '',
    'is_sparse' as 'col/@colname', is_sparse as 'col/@colvalue', '',
    'is_column_set' as 'col/@colname', is_column_set as 'col/@colvalue'
   for xml path(''), type
   ) as thexml
from sys.all_columns as t --table goes here
) as src
cross apply src.thexml.nodes('./col') as r(col)
) as d
group by d.colname, d.colvalue;

--verify counters
select * --thecounter=3, 3 rows
from sys.all_columns
where name = 'NodeName'
lptr
  • 1
  • 2
  • 6
  • 16