I have a large CSV file in which a line looks like the one below:
id_85,
{
"link": "some link",
"icon": "hello.gif",
"name": "Wall Photos",
"comments": {
"count": 0
},
"updated_time": "2012-03-12",
"object_id": "400",
"is_published": true,
"properties": [
{
"text": "University",
"name": "By",
"href": "some link"
}
],
"from": {
"id": "7778",
"name": "Let"
},
"message": "Hello World! :D",
"id": "id_85",
"created_time": "2012-03-12",
"to": {
"data": [
{
"id": "100",
"name": "March"
}
]
},
"message_tags": {
"0": [
{
"id": "100",
"type": "user",
"name": "Marcelo",
"length": 7,
"offset": 0
}
]
},
"type": "photo",
"caption": "Hello world!"
}
I am trying to just get the json part of it between the first and the ending curly brackets.
Below is my python regex code so far
import re
str = "id_85,{"link": "some link", "icon": "hello.gif", "name": "Wall Photos", "comments": {"count": 0}, "updated_time": "2012-03-12", "object_id": "400", "is_published": true, "properties": [{"text": "University", "name": "By", "href": "some link"}], "from": {"id": "777", "name": "Let"}, "message": "Hello World! :D", "id": "id_85", "created_time": "2012-03-12", "to": {"data": [{"id": "100", "name": "March"}]}, "message_tags": {"0": [{"id": "100", "type": "user", "name": "March", "length": 7, "offset": 0}]}, "type": "photo", "caption": "Hello world!"} "
m = re.match(r'.*,({.*}$)', str)
if m:
print m.group(1)
There are some cases where it does not take the first and last curly brackets, something like this { ... } . How do I ensure that only the text between first and last curly brackets is included and not any other?
The desired output is something that looks like this:
{"link": "some link", "icon": "hello.gif", "name": "Wall Photos", "comments": {"count": 0}, "updated_time": "2012-03-12", "object_id": "400", "is_published": true, "properties": [{"text": "University", "name": "By", "href": "some link"}], "from": {"id": "777", "name": "Let"}, "message": "Hello World! :D", "id": "id_85", "created_time": "2012-03-12", "to": {"data": [{"id": "100", "name": "March"}]}, "message_tags": {"0": [{"id": "100", "type": "user", "name": "March", "length": 7, "offset": 0}]}, "type": "photo", "caption": "Hello world!"}
Thanks!