0

I'm running a query to count the number of occurrences of each value on each column.

I've managed to accomplish this by running a for loop outside from SQL, but it is quite slow.

The source table looks like this:

Column 0 Column 1 Column 2 Column 3 Column 4
-------- -------- -------- -------- --------
111      AA       BB       CC       DD      
111      AA       BBBBB    CC       DD      
222      MM       3535     678      CHARLY  
222      MM       3535     678a     CHARLY  
333      XX       8989     699      DAVIDaaa
333      XX       8989     699      DAVID   
444      YY       456      351      MIGUEL  
444      YY       456      351m     MIGUEL  
555      AA       963x     568      969zzzzz
555      AA       963      568a     969         

This counts the number of occurrences of each column based on Column 0:

select 
count(*) over ( partition by g.[Column 0] ) as Count0,
count(*) over ( partition by g.[Column 1] ) as Count1,
count(*) over ( partition by g.[Column 2] ) as Count2,
count(*) over ( partition by g.[Column 3] ) as Count3,
count(*) over ( partition by g.[Column 4] ) as Count4

FROM Diff g

WHERE [Column 0]='111'

ORDER BY [Column 0]

Which outputs:

Count0      Count1      Count2      Count3      Count4
----------- ----------- ----------- ----------- -----------
2           2           1           2           2
2           2           1           2           2

I'm manually changing the ID number on the WHERE [Column 0]='###' statement, and then inserting the results into another table using another software.

Since a value can exist several times on each column, I need the number of occurrences of each value on the subset regarding one ID, followed by the number of occurrences of each value on the subset regarding the next ID, and so on.

This works efficiently with a couple thousand of rows, but takes a lot of time when processing 200k - 300k rows and 30 columns.

I'm sure there is a smarter and more efficient way to recursively run that count-query on every distinct row on Column 0.

The result I expect looks like this:

Column 0 Column 1 Column 2 Column 3 Column 4
-------- -------- -------- -------- --------
2        2        1        2        2       
2        2        1        2        2       
2        2        2        1        2       
2        2        2        1        2       
2        2        2        2        1       
2        2        2        2        1       
2        2        2        1        2       
2        2        2        1        2       
2        2        1        1        1       
2        2        1        1        1       

Any smarter alternative to achieve this?

David
  • 1
  • 2

1 Answers1

0

I think this does what you want:

select count(*) over ( partition by g.[Column 0] ) as Count0,
       count(*) over ( partition by g.[Column 0], g.[Column 1] ) as Count1,
       count(*) over ( partition by g.[Column 0], g.[Column 2] ) as Count2,
       count(*) over ( partition by g.[Column 0], g.[Column 3] ) as Count3,
       count(*) over ( partition by g.[Column 0], g.[Column 4] ) as Count4
FROM Diff g
ORDER BY [Column 0]
Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786
  • Hi Sir. Your answer works on the sample data I posted since every value exists no more than 2 times on each column. The thing is that the query should count the number of occurrences of each value on an ID-basis. I just run your query on the real escenario and got each value counted considering all IDs, not the number of occurrences of each value on the subset regarding one ID, followed by the number of occurrences of each value on the subset regarding the next ID, and so on. I just edited the source table – David Sep 03 '19 at 17:40