I use the brown corpus "brown.words()" which gives me a list of 1161192 words.
Now I want to find any occurrence of the word "have", so whenever in the corpus there is an "has", "had", "haven't" ect. I want to do something (could be pushing them into an array, could be a counter, could be something else.
Edit: Note that this question is about finding a matching word. If I search "have" I want a way to match it to "haven't" or "had", thus the .count() would not solve this problem as it dosen't help matching anything.
Example code I would use in case stemming/lemmatization would work:
def findWordFamily(findWord):
wordFamily = []
lmtzr = WordNetLemmatizer()
findWord = lmtzr.lemmatize(findWord)
for word in brown.words():
lemma = lmtzr.lemmatize(word)
if lemma == findWord:
wordFamily.append(word)
return wordFamily
print(findWordFamily("have"))
# ["have", "have", "had", "having","haven't", "having"]
But the problem is that:
for word in brown.words():
lemma = lmtzr.lemmatize(word)
# if word is "having" lemma also is "having" instead of "have"