1

I have a number of text file. I would like to use NLTK for preprocessing and printing the vocabulary in a plain text .text format, so that I can distribute those file for the people to use. I did following to do it.I started with taking single file:

file1 = open("path/to/text/file","rU")
raw = file1.read()
tokens = nltk.wordpunct_tokenize(raw)
words = [w.lower for w in tokens]
vocab = sorted(set(tokens))

Now i would like to list of items in vocab into a plain text .txt human readable file. How would I do it?

gsamaras
  • 71,951
  • 46
  • 188
  • 305
thetna
  • 6,903
  • 26
  • 79
  • 113
  • One mistake that is there in your question is that after lower captioning the words, you are not using them to build your vocab. – CKM Mar 24 '17 at 06:00

1 Answers1

4

Write it out manually:

with open("output.txt", "w") as f:
    for item in vocab:
        f.write(item + "\n")
Niklas B.
  • 92,950
  • 18
  • 194
  • 224
brice
  • 24,329
  • 7
  • 79
  • 95