I have a large list of API calls stored as strings, which have been stripped of all common syntax('htttp://', '.com', '.', etc..)
I would like to return a dictionary of the most common patterns which have a length > 3, where the keys are the found patterns and values are the number of occurrences of each pattern. I've tried this:
calls = ['admobapioauthcert', 'admobapinewsession', 'admobendusercampaign']
>>> from itertools import takewhile, izip
>>> ''.join(c[0] for c in takewhile(lambda x: all(x[0] == y for y in x), izip(*calls)))
returns:
'admob'
I would like it to return:
{'obap': 2, 'dmob': 3, 'admo': 3, 'admobap': 2, 'bap': 2, 'dmobap': 2, 'admobapi': 2, 'moba': 2, 'bapi': 2, 'dmo': 3, 'obapi': 2, 'mobapi': 2, 'admob': 3, 'api': 2, 'dmobapi': 2, 'dmoba': 2, 'mobap': 2, 'mob': 3, 'adm': 3, 'admoba': 2, 'oba': 2}
-My current method only works at identifying prefixes, but i need it to operate on all characters, regardless of it's position in the string, and again I would like to store the number of occurrences of each pattern as dict values. (I've tried other methods to accomplish this, but they are quite ugly).