4

anyone here that has ever used the readability 0.2 or textstat 0.3.1 package in python? Couldn't find anything on SO dealing with this subject or any good documentation on this.

So far my code is: It iterates over a bunch of txt files locally stored and prints the result (readability measures) into a master text file.

from textstat.textstat import textstat
import os
import glob
import contextlib


@contextlib.contextmanager
def stdout2file(fname):
    import sys
    f = open(fname, 'w', encoding="utf-8")
    sys.stdout = f
    yield
    sys.stdout = sys.__stdout__
    f.close()


def readability():
        os.chdir(r"F:\Level1\Level2")
        with stdout2file("Results_readability.txt"):
                for file in glob.iglob("*.txt"):  # iterates over all files in the directory ending in .txt
                        with open(file, encoding="utf8") as fin:
                                contents = fin.read()
                                if __name__ == '__main__':
                                        print(textstat.flesch_reading_ease(contents))
                                        print(file.split(os.path.sep)[-1], end=" | ")
                                        print(textstat.smog_index(contents), end="\n ")
                                        print(file.split(os.path.sep)[-1], end=" | ")
                                        print(textstat.gunning_fog(contents), end="\n ")

This works pretty good, however I have two problems:

  1. Is it possible to store my masterfile into another directory? If I am using the code above my masterfile is created in the same directory as my files that are iterated and this is kind of senseless...

  2. Anyone has experience how accurate these packages work? I just tested the same string in textstat and http://www.webpagefx.com/tools/read-able/check.php / http://gunning-fog-index.com/ and get significant different results on all measures?

Any help appreciated.

Florian Schramm
  • 333
  • 3
  • 15

2 Answers2

2

I suspect that textstat uses different coefficients. A simple check: run it on one sentence consisting of one word consisting of one syllable. I used the text "No.":

In: textstat.flesch_kincaid_grade("No.")
Out: -4.6

But according to the formula in the literature, the answer should be -3.4 (that's 0.39*1+11.8*1-15.59)

Frits
  • 7,341
  • 10
  • 42
  • 60
Ilia
  • 21
  • 2
1

For the first question, you can specify your file to be in any directory, just give it the full path: "F:\Level1\Level2\Results_readability.txt" or a relative path "..\Other\Results_readibilty.txt"

For the second question. YMMV. Readability is not an exact science. It is possible to construct a sentence that uses short but obscure words that appears easy to read but isn't.

Then again, to count the number of syllables needs various heuristics to decide how to split a word into syllables, if this is failing on your text it would cause errors. That said, textstat does implement correct versions of the various readability indices. If the results are differing, you can investigate why.

James K
  • 3,692
  • 1
  • 28
  • 36
  • Thanks for your answer. I know, I tried using os.chdir in my stdout2file decorator but it didn't work. It created a txt file but without any content... For the second question: I wouldn't have any problems if the results were more or less the same but comparing my results with i.e. the gunning-fog-index website I get results of 23,54 to 11.55 what makes a huge difference (using the last paragraph of your answer ;) )... That's why I was asking about experiences on this package and the accuracy of the implementation – Florian Schramm Sep 16 '16 at 19:42