So, I have lists of words and I need to know how often each word appears on each list. Using ".count(word)" works, but it's too slow (each list has thousands of words and I have thousands of lists).
I've been trying to speed things up with numpy. I generated a unique numerical code for each word, so I could use numpy.bincount (since it only works with integers, not strings). But I get "ValueError: array is too big".
So now I'm trying to tweak the "bins" argument of the numpy.histogram function to make it return the frequency counts I need (somehow numpy.histogram seems to have no trouble with big arrays). But so far no good. Anyone out there happens to have done this before? Is it even possible? Is there some simpler solution that I'm failing to see?