So the purpose of this program is to find example sentences for each word in ner.txt
. For example, if the word apple
is in ner.txt
then I would like to find if there is any sentence that contains the word apple
and output something like apple: you should buy an apple juice
.
So the logic of the code is pretty simple, as I need only one example sentence per word in ner.txt.
. I am using NLTK to determine if it's a sentence or not.
The problem is at the bottom of the code. I am using 2 for loops to find example sentences for each word. This is painfully slow and not usable for large files. How can I make this efficient? or is there any better way to do this without my logic?
from nltk.tokenize import sent_tokenize
news_articles = "test.txt"
oov_ner = "ner.txt"
news_data = ""
with open(news_articles, "r") as inFile:
news_data = inFile.read()
base_news = sent_tokenize(news_data)
with open(oov_ner, "r") as oovNER:
oov_ner_content = oovNER.readlines()
oov_ner_data = [x.strip() for x in oov_ner_content]
my_dict = {}
for oovner in oov_ner_data:
for news in base_news:
if oovner in news:
my_dict[oovner] = news
print(my_dict)