I have a large file that contains a particular structure.I want to know the top 10 most commonly occurring values for a particular fields in the structure.Will I be able to do it in a single parse?
Asked
Active
Viewed 48 times
0
-
why is this information insufficient.My question is mostly theoretical regarding the algorithim – liv2hak Apr 12 '12 at 06:08
1 Answers
1
You'll need to store and update an associative array that contains the field and number of occurrences. Depending on how many different fields there are, your memory will be the limitation.
After that's done, do a sort of the array based on the value.
AFAIK, C does not include an associative array data type, so you'll need to use a 3rd party library, see Looking for a good hash table implementation in C for some options.
As for sorting, there is http://linux.die.net/man/3/qsort.
So ignoring possible memory requirements, you can do it in one pass.
-
The file I am talking about is quite huge and the possibility of storing every single occurrence in memory is almost impossible.My question was primarily is there a way around storing every single value,say I will store the first 10 or 20.This wouldn't work out since the second half of the file could all be one value and that would screw up my calculation. – liv2hak Apr 12 '12 at 06:33
-
1Yes, you would need to go through the file and keep a count of every single field. I know you requested a single parse, but have you considered map reduce by any chance? :) – gak Apr 12 '12 at 06:44
-
come to think of it map reduce can be a very good option for this algorithm.your suggestion is appreciated.thanks. – liv2hak Apr 12 '12 at 09:29