0

Compute the word coverage of all file IDs associated with the text corpus gutenberg. what is the write code for this,

import nltk
from nltk.corpus import gutenburg
from decimal import Decimal

for fileid in gutenburg.fileids():
  n_chars = len(gutenburg.raw(fileid))
  n_words = len(gutenburg.words(fileids))
  print(round(Decimal(n_chars/n_words), 7), fileids)

1 Answers1

0
import nltk

from nltk.corpus import gutenberg

for fileid in gutenberg.fileids():
    total_unique_words = len(set(gutenberg.words(fileid)))
    total_words = len(gutenberg.words(fileid))
    print(total_words/total_unique_words,fileid)
  • 5
    Please don't post only code as an answer, but include an explanation what your code does and how it solves the problem of the question. Answers with an explanation are generally of higher quality, and are more likely to attract upvotes. – Mark Rotteveel Feb 09 '20 at 11:48