Escaping in regex expression python

Question

I'm looking to extract the id tag from the following field of data:

{"purchased_at":"2020-04-21T05:55:30.000Z","product_desc":"Garnier 2019 Shampoo","onhold":{"copyright":true,"country_codes":["ABC"],"scope":"poss"},"id":"8745485"}

The regex I'm using breaks when this field is encountered as I'm using '"id":\s*"(.*?)"'.

Because, only some fields have such extra onhold tag:

{"purchased_at":"2020-04-21T05:55:30.000Z","product_desc":"All clear 2019 \n ","id":"7462764"}

The whole file is of the form:

{"info":[{"purchased_at":"","product_desc":"","id":""}{..}]}

This looks like JSON, you should use the `json` module, not regex. — mkrieger1, Sep 06 '20 at 07:05
And the regex question is unclear. What exactly do you mean by "breaks"? — mkrieger1, Sep 06 '20 at 07:10
@mkrieger1 ID = re.search(id_pattern, match.group(0)) when I try this I get Nonetype has no group object. — Newbie, Sep 06 '20 at 07:28

Barbaros Özhan · Answer 1 · 2020-09-06T08:32:36.593

1

You can import json library in order to extract the desired value for the key (id), rather than using a regular expression :

import json
str = '{"purchased_at":"2020-04-21T05:55:30.000Z","product_desc":"Garnier 2019 Shampoo","onhold":{"copyright":true,"country_codes":["ABC"],"scope":"poss"},"id":"8745485"}'

js = json.loads(str)

for i in js:
      if i == 'id':
            print(js[i])

>>>
8745485

Update : If you need to find out by using methods related with regular expression, then using search function of re library with proper pattern might help :

import re
str = '{"purchased_at":"2020-04-21T05:55:30.000Z","product_desc":"Garnier 2019 Shampoo","onhold":{"copyright":true,"country_codes":["ABC"],"scope":"poss"},"id":"8745485"}'

s = re.search('id":"(.+?)"', str)

if s:
    print( s.group(1) )

>>>
8745485

edited Sep 06 '20 at 08:32

answered Sep 06 '20 at 07:15

Barbaros Özhan

59,113
10
31
55

I understand. But I'm looking for a regex answer! – Newbie Sep 06 '20 at 07:27
It needs to be said again: *don't* use regex for this. – tripleee Sep 06 '20 at 10:04
then, the first part is OK, and I should delete the **update** part, do you mean this @tripleee ? – Barbaros Özhan Sep 06 '20 at 10:08
1

Nah, just trying to tell the OP that they definitely don't want what they insist that they want. – tripleee Sep 06 '20 at 10:10

Liju · Answer 2 · 2020-09-06T21:48:42.210

0

Just use findall method in re module to extract data.

import re
line='{"purchased_at":"2020-04-21T05:55:30.000Z","product_desc":"Garnier 2019 Shampoo","onhold":{"copyright":true,"country_codes":["ABC"],"scope":"poss"},"id":"8745485"}'
print(re.findall('"id":\s*"(.*?)"',line))

Output

['8745485']

edited Sep 06 '20 at 21:48

answered Sep 06 '20 at 07:43

Liju

2,273
3
6
21

Escaping in regex expression python

2 Answers2