0

I'm working on extract data from string, and output them into "companyID", "date" and "amount" ,
I have read through Python regex: matching a parenthesis within parenthesis to output my date and amount successfully but stocking on:

1. The "companyID" not output well
2. when mutiple string come in couldn,t list them well

Q1: 1. The "companyID" not output well

import re


text = "Original string:(Company_name_1):(2032/01/15*70000 )"

# (almost there) print ( "companyID", ( re.findall (r'*/(/ \.?([ \d.]+ */)/  )',text)))
# (almost there) re.match(r'(ftp|http)://.*\.(jpg|png)$', s)

print ( "companyID", ( re.findall (r'\([^()]*(\):)$',text)))

print ( "date", ( re.findall (r'\d+[/.-]\d+[/.-]\d{2,4}',text)))

print ( "amount", ( re.findall (r'[*]\.?([ \d.]+)',text)))

(pic Q1) the code and output showing

Q2 2. when mutiple string come in, couldn't list them well, and some of my date is missing in my output

ex:
{“Company_name_1”,[{“2032/01/15" : "70000" }]}

{“Company_name_2, Subsidiary_name_1, Subsidiary_name_2”,[{"2024/1/120000"} ; {"2022/11/720000"} ; {"2023/3/61000"} ; {"2023/4/2020000 "}; {"2026/5/18000 "};{" 2023/5/88000"} ; {"2005/7/27300000 "}; {"2023/8/2280000"} ; {"2023/8/760000"} ; {"2023/11/670000"} ; {"2004/1/1912000"} ; {"1998/3/1416000 "} ]}

import re


text = "Original string:(Company_name_1):(2032/01/15*70000)(Company_name_2, Subsidiary_name_1, Subsidiary_name_2):(2024/1/1*20000 ; 2022-11-7*20000 ; 2023/3/6*1000 ; 2023/4/20*20000 ; 2026/5/1*8000 ; 2023/5/8*8000 ; 2005/7/27*300000 ; 2023/8/22*80000 ; 2023/8/7*60000 ; 2023/11/6*70000 ; 2004/1/19*12000 ; 1998/3/14*16000 ; )"

# (almost there) print ( "companyID",( re.findall (r'*/(/ \.?([ \d.]+ */)/  )',text)))
# (almost there) re.match(r'(ftp|http)://.*\.(jpg|png)$', s)

print ( "companyID", ( re.findall (r'\([^()]*(\):)$',text)))

print ( "date", ( re.findall (r'\d+[/.-]\d+[/.-]\d{2,4}',text)))

print ( "amount", ( re.findall (r'[*]\.?([ \d.]+)',text)))

(pic Q2)the code and outputshowing

1 Answers1

0

suggest:

  1. Debug on the tool first,then use python
  2. Regular expressions preferably no spaces
BugMaker
  • 61
  • 3
  • hi, @BugMaker I have used the onlone regex tool https://regex101.com/ , but I haven't figure out the solution yet from yesterday. For space I will remove it later. So if I just want the company name pop out, how can I correct? –  Sep 22 '22 at 03:14