I was looking at methods to split documents into paragraphs and I came across texttiling as one possible way to do this.
Here is my attempt to use it. However, I don't understand how to work with the output. I'd appreciate your help.
t = unidecode(doclist[0].decode('utf-8','ignore'))
nltk.tokenize.texttiling.TextTilingTokenizer(t)
output:
<nltk.tokenize.texttiling.TextTilingTokenizer at 0x11e9c6350>