I'm working on extract data from string, and output them into "companyID", "date" and "amount" ,
I have read through Python regex: matching a parenthesis within parenthesis to output my date and amount successfully but stocking on:
1. The "companyID" not output well
2. when mutiple string come in couldn,t list them well
Q1: 1. The "companyID" not output well
import re
text = "Original string:(Company_name_1):(2032/01/15*70000 )"
# (almost there) print ( "companyID", ( re.findall (r'*/(/ \.?([ \d.]+ */)/ )',text)))
# (almost there) re.match(r'(ftp|http)://.*\.(jpg|png)$', s)
print ( "companyID", ( re.findall (r'\([^()]*(\):)$',text)))
print ( "date", ( re.findall (r'\d+[/.-]\d+[/.-]\d{2,4}',text)))
print ( "amount", ( re.findall (r'[*]\.?([ \d.]+)',text)))
(pic Q1) the code and output showing
Q2 2. when mutiple string come in, couldn't list them well, and some of my date is missing in my output
ex:
{“Company_name_1”,[{“2032/01/15" : "70000" }]}
{“Company_name_2, Subsidiary_name_1, Subsidiary_name_2”,[{"2024/1/120000"} ; {"2022/11/720000"} ; {"2023/3/61000"} ; {"2023/4/2020000 "}; {"2026/5/18000 "};{" 2023/5/88000"} ; {"2005/7/27300000 "}; {"2023/8/2280000"} ; {"2023/8/760000"} ; {"2023/11/670000"} ; {"2004/1/1912000"} ; {"1998/3/1416000 "} ]}
import re
text = "Original string:(Company_name_1):(2032/01/15*70000)(Company_name_2, Subsidiary_name_1, Subsidiary_name_2):(2024/1/1*20000 ; 2022-11-7*20000 ; 2023/3/6*1000 ; 2023/4/20*20000 ; 2026/5/1*8000 ; 2023/5/8*8000 ; 2005/7/27*300000 ; 2023/8/22*80000 ; 2023/8/7*60000 ; 2023/11/6*70000 ; 2004/1/19*12000 ; 1998/3/14*16000 ; )"
# (almost there) print ( "companyID",( re.findall (r'*/(/ \.?([ \d.]+ */)/ )',text)))
# (almost there) re.match(r'(ftp|http)://.*\.(jpg|png)$', s)
print ( "companyID", ( re.findall (r'\([^()]*(\):)$',text)))
print ( "date", ( re.findall (r'\d+[/.-]\d+[/.-]\d{2,4}',text)))
print ( "amount", ( re.findall (r'[*]\.?([ \d.]+)',text)))