I'm attempting to break a list of words (a tokenized string) into each possible substring. I'd then like to run a FreqDist on each substring, to find the most common substring. The first part works fine. However, when I run the FreqDist, I get the error:
TypeError: unhashable type: 'list'
Here is my code:
import nltk
string = ['This','is','a','sample']
substrings = []
count1 = 0
count2 = 0
for word in string:
while count2 <= len(string):
if count1 != count2:
temp = string[count1:count2]
substrings.append(temp)
count2 += 1
count1 +=1
count2 = count1
print substrings
fd = nltk.FreqDist(substrings)
print fd
The output of substrings
is fine. Here it is:
[['This'], ['This', 'is'], ['This', 'is', 'a'], ['This', 'is', 'a', 'sample'], ['is'], ['is', 'a'], ['is', 'a', 'sample'], ['a'], ['a', 'sample'], ['sample']]
However, I just can't get the FreqDist to run on it. Any insight would be greatly appreciated. In this case, each substring would only have a FreqDist of 1, but this program is meant to be run on a much larger sample of text.