I need to identify all abbreviations and hyphenated words in my sentences to start. They need to be printed as they get identified. My code does not seem to be functioning well for this identification.
import re
sentence_stream2=df1['Open End Text']
for sent in sentence_stream2:
abbs_ = re.findall(r'(?:[A-Z]\.)+', sent) #abbreviations
hypns_= re.findall(r'\w+(?:-\w+)*', sent) #hyphenated words
print("new sentence:")
print(sent)
print(abbs_)
print(hypns_)
One of the sentences in my corpus is: DevOps with APIs & event-driven architecture using cloud Data Analytics environment Self-service BI
The output for this is:
new sentence:
DevOps with APIs & event-driven architecture using cloud Data Analytics environment Self-service BI
[]
['DevOps', 'with', 'APIs', 'event-driven', 'architecture', 'using', 'cloud', 'Data', 'Analytics', 'environment', 'Self-service', 'BI']
expected output is:
new sentence:
DevOps with APIs & event-driven architecture using cloud Data Analytics environment Self-service BI
['APIs','BI']
['event-driven','Self-service']