how to calculate Word Coverage in gutenburg corpus in python library nltk?

Question

Compute the word coverage of all file IDs associated with the text corpus gutenberg. what is the write code for this,

import nltk
from nltk.corpus import gutenburg
from decimal import Decimal

for fileid in gutenburg.fileids():
  n_chars = len(gutenburg.raw(fileid))
  n_words = len(gutenburg.words(fileids))
  print(round(Decimal(n_chars/n_words), 7), fileids)

score 0 · Answer 1 · answered Feb 09 '20 at 11:29

0

import nltk

from nltk.corpus import gutenberg

for fileid in gutenberg.fileids():
    total_unique_words = len(set(gutenberg.words(fileid)))
    total_words = len(gutenberg.words(fileid))
    print(total_words/total_unique_words,fileid)

answered Feb 09 '20 at 11:29

GAYATHRI VS

1

5

Please don't post only code as an answer, but include an explanation what your code does and how it solves the problem of the question. Answers with an explanation are generally of higher quality, and are more likely to attract upvotes. – Mark Rotteveel Feb 09 '20 at 11:48

how to calculate Word Coverage in gutenburg corpus in python library nltk?

1 Answers1