I'm running a query to count the number of occurrences of each value on each column.
I've managed to accomplish this by running a for loop outside from SQL, but it is quite slow.
The source table looks like this:
Column 0 Column 1 Column 2 Column 3 Column 4
-------- -------- -------- -------- --------
111 AA BB CC DD
111 AA BBBBB CC DD
222 MM 3535 678 CHARLY
222 MM 3535 678a CHARLY
333 XX 8989 699 DAVIDaaa
333 XX 8989 699 DAVID
444 YY 456 351 MIGUEL
444 YY 456 351m MIGUEL
555 AA 963x 568 969zzzzz
555 AA 963 568a 969
This counts the number of occurrences of each column based on Column 0:
select
count(*) over ( partition by g.[Column 0] ) as Count0,
count(*) over ( partition by g.[Column 1] ) as Count1,
count(*) over ( partition by g.[Column 2] ) as Count2,
count(*) over ( partition by g.[Column 3] ) as Count3,
count(*) over ( partition by g.[Column 4] ) as Count4
FROM Diff g
WHERE [Column 0]='111'
ORDER BY [Column 0]
Which outputs:
Count0 Count1 Count2 Count3 Count4
----------- ----------- ----------- ----------- -----------
2 2 1 2 2
2 2 1 2 2
I'm manually changing the ID number on the WHERE [Column 0]='###'
statement, and then inserting the results into another table using another software.
Since a value can exist several times on each column, I need the number of occurrences of each value on the subset regarding one ID, followed by the number of occurrences of each value on the subset regarding the next ID, and so on.
This works efficiently with a couple thousand of rows, but takes a lot of time when processing 200k - 300k rows and 30 columns.
I'm sure there is a smarter and more efficient way to recursively run that count-query on every distinct row on Column 0.
The result I expect looks like this:
Column 0 Column 1 Column 2 Column 3 Column 4
-------- -------- -------- -------- --------
2 2 1 2 2
2 2 1 2 2
2 2 2 1 2
2 2 2 1 2
2 2 2 2 1
2 2 2 2 1
2 2 2 1 2
2 2 2 1 2
2 2 1 1 1
2 2 1 1 1
Any smarter alternative to achieve this?