-1

I want to only get complete words from acronyms with ( ) around them.

For example, there is a sentence 'Lung cancer screening (LCS) reduces NSCLC mortality'; ->I want to get 'Lung cancer screening' as a result.

How can I do it with regex?


original question: I want to remove repeated upper alphabets : "HIV acquired immunodeficiency syndrome are at a particularly high risk of cervical cancer" => " acquired immunodeficiency syndrome are at a particularly high risk of cervical cancer"

정다라
  • 13
  • 3
  • Show your own effort (code) as properly formatted text in the question. You can use https://regex101.com to play with regular expressions (set flavor to Python). – Michael Butscher Nov 27 '22 at 08:51

2 Answers2

1
import re
s = 'HIV acquired immunodeficiency syndrome are at a particularly high risk of cervical cancer'
print(re.sub(r'([A-Z])', lambda pat:'', s).strip()) # Inline

according to @jensgram answer

Mouayad_Al
  • 1,086
  • 2
  • 13
0

Assuming you want to target 2 or more capital letters, I would use re.sub here:

inp = "Lung cancer screening (LCS) reduces NSCLC mortality"
output = re.sub(r'\s*(?:\([A-Z]+\)|[A-Z]{2,})\s*', ' ', inp).strip()
print(output)  # Lung cancer screening reduces mortality
Tim Biegeleisen
  • 502,043
  • 27
  • 286
  • 360