0

I am hitting an api endpoint using python requests to get some data and here is my template of my response.text(Actual response contains millions of event objects)

{"event":"Session","properties":{"time":"1642145186", "distinct_id":"ABC-123", "Region":"EU"}}
{"event":"Login","properties":{"time":"1642125126", "distinct_id":"ABC-123", "Region":"EU"}}
{"event":"Register","properties":{"time":"16432125126", "distinct_id":"ABC-123", "Region":"EU"}}

When I try to convert my response.text to json object using response.json() or json.loads(json.dumps(response.text)) it throws the following error

json.decoder.JSONDecodeError: Extra data: 

I do understand why it is throwing that error as the response.text is not in the right format what JSON object need to be and I would like to convert that text into a json object. Is it something we can do with regex so we can add a comma at the end of each line(I am newbie with Regex concept)? Or any other option? I would really appreciate some help here.

Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
NAB0815
  • 441
  • 4
  • 24
  • 1
    if its ndjson format you can use this https://stackoverflow.com/questions/67736164/convert-ndjson-to-json-in-python – Tomáš Šturm Jan 19 '22 at 18:40
  • I did not know that format is called ndjson or something like that existed. Thanks for teaching me something new today. That worked @Tomᚊturm – NAB0815 Jan 19 '22 at 18:44

1 Answers1

1

That data is ndjson.

The spec says:

Each JSON text MUST conform to the [RFC7159] standard and MUST be written to the stream followed by the newline character \n (0x0A). The newline character MAY be preceded by a carriage return \r (0x0D). The JSON texts MUST NOT contain newlines or carriage returns.

Since the delimeter is a newline you should split on newlines and decode each object individually.

MonkeyZeus
  • 20,375
  • 4
  • 36
  • 77