I wrote a script to iterate through multiple text files in a directory and count the words contained in each that are also contained in a dictionary file. I wrote and tested the script with two files in the directory, and got it working perfectly, the script spits out two accurate integers, one for each file. However, once I add new files to the directory, I get a FileNotFound error. The file is definitely in there! Can anyone tell me what it is about the code that is causing this? I've gone through various other such posts on StackOverflow with no success. The newly added file has all the same properties as the existing two.
Code (word_count_from_dictionary-iterating.py):
import os
import sys
import nltk
nltk.download()
from nltk import word_tokenize
from nltk.corpus import stopwords
from nltk.stem import WordNetLemmatizer
import io
files_path = sys.argv[1]
textfile_dictionary = sys.argv[2]
for filename in os.listdir(files_path):
if filename.endswith(".txt"):
#accessing file for processing
file = open(filename, "rt")
text = file.read()
#tokenize text file
tokens = word_tokenize(text)
#remove non-alphabetical characters
words = []
for word in tokens:
if word.isalpha():
words.append(word)
#remove stopwords
stop_words = stopwords.words("english")
words_without_stops = []
for w in words:
if not w in stop_words:
words_without_stops.append(w)
#lemmatize remaining tokens and print
lemmatizer = WordNetLemmatizer()
lemmas = []
for x in words_without_stops:
lemmatizer.lemmatize(x)
lemmas.append(x)
#turn dictionary held in text file into a list of tokens
file = io.open(textfile_dictionary, mode="r", encoding="utf8")
dictionaryread = file.read()
dictionary = dictionaryread.split()
#count instances of each word in dictionary in the novel and add them up
word_count = 0
for element in dictionary:
for lemma in lemmas:
if lemma == element:
word_count = word_count + 1
print(word_count)
Results on command line with just two test files in the directory:
c@Computer:~/Dropbox/programming/first_project$ python3 word_count_from_dictionary_iterating.py directoryaddress dictionary.txt
showing info https://raw.githubusercontent.com/nltk/nltk_data/gh-pages/index.xml
241
229
Results after adding a new file (newfile.txt) to the directory:
c@Computer:~/Dropbox/programming/first_project$ python3 word_count_from_dictionary_iterating.py directoryaddress happy_words.txt
showing info https://raw.githubusercontent.com/nltk/nltk_data/gh-pages/index.xml
241
229
Traceback (most recent call last):
File "word_count_from_dictionary_iterating.py", line 17, in <module>
file = open(filename, "rt")
FileNotFoundError: [Errno 2] No such file or directory: 'newfile.txt'
If I run ls on the directory, the file is showing up. If I apply the script, adjusted without the iterating loop, to newfile.txt, it works. But it's just not working when looping through the directory.
Any help appreciate, I am new to programming.