I am trying to create a program that runs though a list of mental health terms, looks in a research abstract, and counts the number of times the word or phrase appears. I can get this to work with single words, but I'm struggling to do this with multiple words. I tried using NLTK ngrams too, but since the number of words from the mental health list varies (i.e., not all terms from the mental health list will be bigrams or trigrams), I couldn't get that to work either.
I want to emphasize that I know splitting each word will only allow single words to be counted, however, I'm just stuck on how to deal with a varying number of words from my list to count in the abstract.
Thanks!
from collections import Counter
abstracts = ['This is a mental health abstract about anxiety and bipolar
disorder as well as other things.', 'While this abstract is not about ptsd
or any trauma-related illnesses, it does have a mental health focus.']
for x2 in abstracts:
mh_terms = ['bipolar disorder', 'anxiety', 'substance abuse disorder',
'ptsd', 'schizophrenia', 'mental health']
c = Counter(s.lower().replace('.', '') for s in x2.split())
for term in mh_terms:
term = term.replace(',','')
term = term.replace('.','')
xx = (term, c.get(term, 0))
mh_total_occur = sum(c.get(v, 0) for v in mh_terms)
print(mh_total_occur)
From my example, both abstracts are getting a count of 1, but I want a count of two.