I am using NLTK Collocations to find trigrams and 'training_set' is a string with many lines of text.
finder = TrigramCollocationFinder.from_words(str(training_set))
print finder.nbest(trigram_measures.pmi, 5)
But I am getting the output as
[('\xe5', '\x8d', '\xb8'), ('\xe5', '\x85', '\x8d'), ('\xe2', '\x80', '\x9c'), ('\xe2', '\x80', '\x9d'), ('\xe2', '\x80', '\xa6')]
Is this some encoding problem? How do I get normal english words?