I have a file which has content like below.
Someone says; Hello; Someone responded Hello back
Someone again said; Hello; No response
Someone again said; Hello waiting for response
I have a python script which counts number of times a particular word occurred in a file. Following is the script.
#!/usr/bin/env python
filename = "/path/to/file.txt"
number_of_words = 0
search_string = "Hello"
with open(filename, 'r') as file:
for line in file:
words = line.split()
for i in words:
if (i == search_string):
number_of_words += 1
print("Number of words in " + filename + " is: " + str(number_of_words))
I am expecting the output to be 4 since Hello occurs 4 times. But I get the output as 2? Following is the output of the script
Number of words in /path/to/file.txt is: 2
I kind of understand that Hello;
is not considered as Hello
because of the word not being exactly the one searched for.
Question:
Is there a way I can make my script pick Hello
even if it was followed by a comma or semi-colon or a dot? Some simple technique which doesn't require to look for substrings again within the found word.