count frequency of all sword in whole corpus

Question

I am trying to count how many times each word appear in whole corpus.
But i am getting the error :

 corpus_root = os.path.abspath('../nlp_urdu/out1_data')
    mycorpus = nltk.corpus.reader.TaggedCorpusReader(corpus_root,'.*')
    noun=[]
    count_freq = defaultdict(int)
    for infile in (mycorpus.fileids()):
        print(infile)
    for i in (mycorpus.tagged_sents()):
         texts = [word for word, pos in i  if (pos == 'NN' )]
         noun.append(texts)  
         count_freq[noun]+= 1
         print(count_freq)

error which i am getting is :

count_freq[noun]+= 1

TypeError: unhashable type: 'list'

If you're mining text, you should look at a CountVectorizer or TFIDF. — cs95, Oct 26 '17 at 18:06

score 0 · Answer 1 · answered Oct 26 '17 at 18:09

0

texts is a list of noun
count_freq is a dict with each key must must a noun (a string)

corpus_root = os.path.abspath('../nlp_urdu/out1_data')
    mycorpus = nltk.corpus.reader.TaggedCorpusReader(corpus_root,'.*')
    count_freq = defaultdict(int)
    for infile in (mycorpus.fileids()):
        print(infile)
    for i in (mycorpus.tagged_sents()):
         texts = [word for word, pos in i  if (pos == 'NN' )]
         for noun in texts :             
             count_freq[noun]+= 1

    print(count_freq)

answered Oct 26 '17 at 18:09

Indent

4,675
1
19
35

actually i just used words which are noun and those words are in "texts"and then appended the whole nouns from the corpus in in list named as noun. – user3778289 Oct 26 '17 at 18:20
this is not showing the correct output.there are 13000 nouns.it is just repeating the one file – user3778289 Oct 26 '17 at 18:25

count frequency of all sword in whole corpus

1 Answers1