I have a 50gb txt file of random strings , out of which I want to count the number of occurrences of a substring in that file.. many times, for different not predefined random substrings.
I was wondering if there is another way to approach the problem.
probabilistic way
Something like a bloom filter , but instead of probabilistic membership check, we could have probabilistic counting. That data structure would be used for count estimations.
Other statistical method(?)
Any dummy method that I could use to estimate the number of occurrences of a string in a text file ? Open to alternatives.
It would be nice if it could be done in <= logarithmic time as I will be doing the same task a lot of times.