-2

I'm looking to extract the id tag from the following field of data:

{"purchased_at":"2020-04-21T05:55:30.000Z","product_desc":"Garnier 2019 Shampoo","onhold":{"copyright":true,"country_codes":["ABC"],"scope":"poss"},"id":"8745485"}

The regex I'm using breaks when this field is encountered as I'm using '"id":\s*"(.*?)"'.

Because, only some fields have such extra onhold tag:

{"purchased_at":"2020-04-21T05:55:30.000Z","product_desc":"All clear 2019 \n ","id":"7462764"}

The whole file is of the form:

{"info":[{"purchased_at":"","product_desc":"","id":""}{..}]}
Barbaros Özhan
  • 59,113
  • 10
  • 31
  • 55
Newbie
  • 29
  • 5

2 Answers2

1

You can import json library in order to extract the desired value for the key (id), rather than using a regular expression :

import json
str = '{"purchased_at":"2020-04-21T05:55:30.000Z","product_desc":"Garnier 2019 Shampoo","onhold":{"copyright":true,"country_codes":["ABC"],"scope":"poss"},"id":"8745485"}'

js = json.loads(str)

for i in js:
      if i == 'id':
            print(js[i])

>>>
8745485   

Update : If you need to find out by using methods related with regular expression, then using search function of re library with proper pattern might help :

import re
str = '{"purchased_at":"2020-04-21T05:55:30.000Z","product_desc":"Garnier 2019 Shampoo","onhold":{"copyright":true,"country_codes":["ABC"],"scope":"poss"},"id":"8745485"}'

s = re.search('id":"(.+?)"', str)

if s:
    print( s.group(1) )

>>>
8745485 
Barbaros Özhan
  • 59,113
  • 10
  • 31
  • 55
0

Just use findall method in re module to extract data.

import re
line='{"purchased_at":"2020-04-21T05:55:30.000Z","product_desc":"Garnier 2019 Shampoo","onhold":{"copyright":true,"country_codes":["ABC"],"scope":"poss"},"id":"8745485"}'
print(re.findall('"id":\s*"(.*?)"',line))

Output

['8745485']
Liju
  • 2,273
  • 3
  • 6
  • 21