I have many .txt files like this:
But they are on a few different languages so the user specifies it this way:
lng = input("In what language is the text typed? ('ca' for catalan, 'es' for spanish, 'en' for english...)\n")
I would like to delete all the stopwords and save the text on another .txt file. I'm using Stanza because I want to do sentiment analysis later on but I can't figure out how to do the stopword removal with it. I have tried it with Spacy because it's way faster but couldn't manage either. This is what I have tried:
import spacy
sp = spacy.load(str(lng) + '_core_web_sm') # the inputted language is stored in 'lng'
all_stopwords = sp.Defaults.stop_words
y = open('NODUP_FILTERED_' + filename, 'r', encoding='utf-8')
txt = y.read()
for line in range(rn):
for word in txt:
if word in all_stopwords:
word = ''
print(txt)
Which returns me this traceback:
OSError: [E050] Can't find model 'es_core_web_sm'. It doesn't seem to be a Python package or a valid path to a data directory.
Even though I have spacy and 'es_core_web_sm' installed.