2

I wrote a script to iterate through multiple text files in a directory and count the words contained in each that are also contained in a dictionary file. I wrote and tested the script with two files in the directory, and got it working perfectly, the script spits out two accurate integers, one for each file. However, once I add new files to the directory, I get a FileNotFound error. The file is definitely in there! Can anyone tell me what it is about the code that is causing this? I've gone through various other such posts on StackOverflow with no success. The newly added file has all the same properties as the existing two.

Code (word_count_from_dictionary-iterating.py):

import os
import sys
import nltk
nltk.download()
from nltk import word_tokenize
from nltk.corpus import stopwords
from nltk.stem import WordNetLemmatizer
import io

files_path = sys.argv[1]
textfile_dictionary = sys.argv[2]

for filename in os.listdir(files_path):
    if filename.endswith(".txt"):

        #accessing file for processing
        file = open(filename, "rt")
        text = file.read()

        #tokenize text file
        tokens = word_tokenize(text)

        #remove non-alphabetical characters
        words = []

        for word in tokens:
            if word.isalpha():
                words.append(word)

        #remove stopwords
        stop_words = stopwords.words("english")
        words_without_stops = []

        for w in words:
            if not w in stop_words:
                words_without_stops.append(w)
                

        #lemmatize remaining tokens and print
        lemmatizer = WordNetLemmatizer()
        lemmas = []
        for x in words_without_stops:
            lemmatizer.lemmatize(x)
            lemmas.append(x)

        #turn dictionary held in text file into a list of tokens
        file = io.open(textfile_dictionary, mode="r", encoding="utf8")
        dictionaryread = file.read()
        dictionary = dictionaryread.split()

        #count instances of each word in dictionary in the novel and add them up
        word_count = 0

        for element in dictionary:
            for lemma in lemmas:
                if lemma == element:
                    word_count = word_count + 1

        print(word_count)

Results on command line with just two test files in the directory:

c@Computer:~/Dropbox/programming/first_project$ python3 word_count_from_dictionary_iterating.py directoryaddress dictionary.txt
showing info https://raw.githubusercontent.com/nltk/nltk_data/gh-pages/index.xml
241
229

Results after adding a new file (newfile.txt) to the directory:

c@Computer:~/Dropbox/programming/first_project$ python3 word_count_from_dictionary_iterating.py directoryaddress happy_words.txt
showing info https://raw.githubusercontent.com/nltk/nltk_data/gh-pages/index.xml
241
229
Traceback (most recent call last):
  File "word_count_from_dictionary_iterating.py", line 17, in <module>
    file = open(filename, "rt")
FileNotFoundError: [Errno 2] No such file or directory: 'newfile.txt'

If I run ls on the directory, the file is showing up. If I apply the script, adjusted without the iterating loop, to newfile.txt, it works. But it's just not working when looping through the directory.

Any help appreciate, I am new to programming.

claudiae
  • 35
  • 4
  • Just to check for simple issues, what is `directoryaddress` in your example? `newfile.txt` is in this directory or is it in `~/Dropbox/programming/first_project` or are they the same? – Tyberius Jan 10 '22 at 04:03
  • newfile.txt is in ~/Dropbox/programming/first_project/directoryaddress. directoryaddress is the folder containing the files I want to iterate through. Thanks for taking the time to help! – claudiae Jan 10 '22 at 19:56

1 Answers1

1

The issue is when you run file = open(filename, "rt"), it is looking for filename in the directory where you started Python (~/Dropbox/programming/first_project/), but you want it to read ~/Dropbox/programming/first_project/directoryaddress.

To ensure you reading the right file, you should either pass in the full path of it as filename or, if you know you will always find it in some subdirectory, simply prepend the path to filename before trying to read it file = open(files_path+"/"+filename, "rt") (there are cleaner ways to combine paths, like the standard library pathlib).

Tyberius
  • 625
  • 2
  • 12
  • 20
  • Thank you! I ended up inserting ```file = open(os.path.join(files_path, filename), "rt")``` in place of ```file = open(filename, "rt")``` to resolve the directory path problem. – claudiae Jan 14 '22 at 15:12