How to get value frequencies for large data

Question

I have a table with millions of rows and 940 columns. I'm really hoping there is a way to summarize this data. I want to see frequencies for each value for EVERY column. I used this code with a few of the columns, but I won't be able to get many more columns in before the processing is too large.

SELECT 
f19_24
,f25_34
,f35_44
,f45_49
,f50_54
,f55_59
,f60_64
,count(1) AS Frequency
    FROM            
        (SELECT a.account, ntile(3) over (order by sum(a.seconds) desc) as ntile
,f19_24
,f25_34
,f35_44
,f45_49
,f50_54
,f55_59
,f60_64
        FROM demo as c
        JOIN aggregates a on c.customer_account = a.account
        WHERE a.month IN ('201804', '201805', '201806')
        GROUP BY a.account
,f19_24
,f25_34
,f35_44
,f45_49
,f50_54
,f55_59
,f60_64
        )       
    WHERE ntile = 1
    GROUP BY 
f19_24
,f25_34
,f35_44
,f45_49
,f50_54
,f55_59
,f60_64

The problem is that the GROUP BY will be far too cumbersome. Is there any other way??? It would be really helpful to be able to see where the high frequencies are in such a large dataset.

Tag your question with the database you are using. Your question also mentions one table but your code references two. Also, you have filtering in the query which is not in the question. — Gordon Linoff, Oct 06 '18 at 02:25

score 0 · Answer 1 · answered Oct 05 '18 at 23:49

Using index can help you to get much faster result in this kind of queries .The best thing to do would depend on what other fields the table has and what other queries run against that table.Without more details, a non-clustered index on month,account that included the f19_24,f25_34,f35_44,f45_49,f50_54,f55_59,f60_64 on aggregates or demo or customer(because I dont know which table includes these fields ) for example this index:

CREATE NONCLUSTERED INDEX IX_fasterquery
ON aggregates(month,accoun)
INCLUDE (f19_24,f25_34,f35_44,f45_49,f50_54,f55_59,f60_64);

That's because if you have that index in place, then SQL will not access the actual table at all when running the query, as it can find all rows with a given "month,accoun, createddate" in the index and it will be able to do that really fast as the index allows precisely for fast access when using the fields that define the key, and it will also have the "f19_24,f25_34,f35_44,f45_49,f50_54,f55_59,f60_64" value for each row and in your case by making this query as proc may you get bether result and the reason why I suggest this is here.

Thanks for this answer. I'll try this and see what happens. – SoSincere3 Oct 08 '18 at 14:11 — SoSincere3, Oct 08 '18 at 14:11
your welcome plz tell me about result after testing. – Ali Eshghi Oct 08 '18 at 21:17 — Ali Eshghi, Oct 08 '18 at 21:17

How to get value frequencies for large data

1 Answers1