4

I have a dataset with invalid json, see snippet below:

{'id': 613, 'name': "new year's eve"}

I want to replace all the single quotes except apostrophes like in: new year's. So the string above should result in valid json like:

{"id": 613, "name": "new year's eve"}

I have tried a simple string replace in Python: string.replace("'", "\""), but this also changes the apostrophe resulting in:

{"id": 613, "name": "new year"s eve"}

Is there a way to fix this with regex, like replace all ' except when encapsulated by "?

user3190748
  • 55
  • 1
  • 4

2 Answers2

2

You can use the ast module

Ex:

import ast

s = """{'id': 613, 'name': "new year's eve"}"""
d = ast.literal_eval(s)
print(d)
Rakesh
  • 81,458
  • 17
  • 76
  • 113
1

You could try

'(\w+)'\s*:

See a demo on regex101.com.


In Python:
import json, re

string = """{'id': 613, 'name': "new year's eve"}"""

rx = re.compile(r"""'(\w+)'\s*:""")
string = rx.sub(r'"\1":', string)
d = json.loads(string)
print(d)

This yields

{'id': 613, 'name': "new year's eve"}

Better yet: where does this string come from in the first place?

Jan
  • 42,290
  • 8
  • 54
  • 79
  • `{'id': 613, 'name': "here is the content of my string 'foo': 'bar'"}` – Cid Feb 13 '19 at 09:18
  • Thanks for the reply. This doesn't work because it matches the text in between quotes. I only want to change the ' to " except when ' is used in a works like: **year's**. – user3190748 Feb 13 '19 at 09:19
  • @Cid: I know very well that it is not perfect. – Jan Feb 13 '19 at 09:19
  • @user3190748: Yes, it does. The printed `d` is alread the parsed dictionary. You could print out the string before if you want to see that is has changed. – Jan Feb 13 '19 at 09:19
  • 1
    *"Better yet: where does this string come from in the first place?"* this question **should** be answered. You shouldn't have to fix an invalid JSON once received – Cid Feb 13 '19 at 09:20
  • @Cid: That'S the point, indeed. OP might not have access to the original data processing pipeline though. – Jan Feb 13 '19 at 09:22
  • The json is from a huge dataset I received from a 3rd party and I don't have acces to the processing pipeline. – user3190748 Feb 13 '19 at 09:25
  • I wish I could remember from whence, but the same problem crossed my desk a couple of years ago. It was third-party data over which I had no control. In my case a simple `.replace("'", '"')` sufficed (no embedded quotes in the strings). – nigel222 Feb 13 '19 at 09:43