I'm working with large files, in this case a file that has one word per line and over 300k lines. I'm trying to find a way to obtain the most common patterns present in the words of the file. For example, if I treat it as a list (small example)
a = [122, pass123, dav1, 1355122]
it should recognize "122" is commonly used.
It is important to do it efficiently because otherwise the processing time will be too much taking into account the number of words to check.
I have tried this, which I saw from this post Python finding most common pattern in list of strings, but in my case it only displays the most common characters in the file:
matches = Counter(reduce(lambda x,y: x+y ,map (lambda x : x ,list_of_words))).most_common()
where list_of_words is a list containing all the words in the file.
Is there any way to obtain string matches starting from 3 characters instead of only getting one char?
Thank you all for your help :)