I'm trying hard to write a Python regex code for extracting German address as show below.
Abc Gmbh Ensisheimer Straße 6-8 79346 Endingen
Def Gmbh Keltenstr . 16 77971 Kippenheim Deutschland
Ghi Deutschland Gmbh 53169 Bonn
Jkl Gmbh Ensisheimer Str . 6 -8 79346 Endingen
I wrote the below code for extracting individual address components and also put them together as a single regex but still unable to detect the above addresses. Can anyone please help me with it?
# TEST COMPANY NAME
string = 'Telekom Deutschland Gmbh 53169 Bonn Datum'
result = re.findall(r'([a-zA-Zäöüß]+\s*?[A-Za-zäöüß]+\s*?[A-Za-zäöüß]?)',string,re.MULTILINE)
print(result)
# TEST STREET NAME
result = re.findall(r'([a-zA-Zäöüß]+\s*\.)',string)
print(result)
# TEST STREET NUMBER
result = re.findall(r'(\d{1,3}\s*[a-zA-Z]?[+|-]?\s*[\d{1,3}]?)',string)
print(result)
# TEST POSTAL CODE
result = re.findall(r'(\d{5})',string)
print(result)
# TEST CITY NAME
result = re.findall(r'([A-Za-z]+)?',string)
print(result)
# TEST COMBINED ADDRESS COMPONENTS GROUP
result = re.findall(r'([a-zA-Zäöüß]+\s+?[A-Za-zäöüß]+\s+?[A-Za-zäöüß]+\s+([a-zA-Zäöüß]+\s*\.)+?\s+(\d{1,3}\s*[a-zA-Z]?[+|-]?\s*[\d{1,3}]?)+\s+(\d{5})+\s+([A-Za-z]+))',string)
print(result)
Please note that my objective is that if any of these addresses are present in a huge paragraph of text then the regex should extract and print only the addresses. Can someone please help me?