Creating Vocabulary in python

Question

I have a number of text file. I would like to use NLTK for preprocessing and printing the vocabulary in a plain text .text format, so that I can distribute those file for the people to use. I did following to do it.I started with taking single file:

file1 = open("path/to/text/file","rU")
raw = file1.read()
tokens = nltk.wordpunct_tokenize(raw)
words = [w.lower for w in tokens]
vocab = sorted(set(tokens))

Now i would like to list of items in vocab into a plain text .txt human readable file. How would I do it?

One mistake that is there in your question is that after lower captioning the words, you are not using them to build your vocab. — CKM, Mar 24 '17 at 06:00

score 4 · Accepted Answer · edited Mar 28 '12 at 14:58

4

Write it out manually:

with open("output.txt", "w") as f:
    for item in vocab:
        f.write(item + "\n")

edited Mar 28 '12 at 14:58

Niklas B.

92,950
18
194
224

answered Mar 28 '12 at 14:57

brice

24,329
7
79
95

3

Or just `f.writelines(vocab)` :) – Niklas B. Mar 28 '12 at 14:58
Not quite: `writelines` does not add newlines, so the current answer is right. – alexis Mar 28 '12 at 16:00

Creating Vocabulary in python

1 Answers1