3

I'm having trouble parsing the body of a request using jsonlines. I'm using tornado as the server and this is happening inside a post() method. My purpose in this is to parse the request's body into separate JSONs, then iterate over them with a jsonlines Reader, do some work on each one and then push them to a DB. I solved this problem by dumping the utf-8 encoded body into a file and then used:

with jsonlines.open("temp.txt") as reader:

That works for me. I can iterate over the entire file with

for obj in reader:

I just feel like this is an unnecessary overhead that can be reduced if I can understand what's keeping me from just using this bit of code instead:

log = self.request.body.decode("utf-8")
with jsonlines.Reader(log) as reader:
   for obj in reader:

the exception I get is this:

jsonlines.jsonlines.InvalidLineError: line contains invalid json: Expecting property name enclosed in double quotes: line 1 column 2 (char 1) (line 1)

I've tried searching for this error here and all I found were examples where people tried using incorrectly formatted jsons that have one quote instead of double quotes. That is not the case for me. I debugged the request and saw that the string that returns from the decode method indeed has double quotes for both properties and values.

here is a sample of the body of the request I send (this is what it looks like in Postman):

{"type":"event","timestamp":"2018-03-25 09:19:50.999","event":"ButtonClicked","params":{"screen":"MainScreen","button":"SettingsButton"}} 
{"type":"event","timestamp":"2018-03-25 09:19:51.061","event":"ScreenShown","params":{"name":"SettingsScreen"}} 
{"type":"event","timestamp":"2018-03-25 09:19:53.580","event":"ButtonClicked","params":{"screen":"SettingsScreen","button":"MissionsButton"}} 
{"type":"event","timestamp":"2018-03-25 09:19:53.615","event":"ScreenShown","params":{"name":"MissionsScreen"}}

You can reproduce the exception by using this simple bit of code in a post method and sending the lines I provided through Postman:

log = self.request.body.decode("utf-8")
with jsonlines.Reader(log) as currentlog:
    for obj in currentlog:
        print("obj")

As a sidenote: Postman sends the data as text, not JSON.

If you need any more information to answer this question, please let me know. One thing I did notice is that the string that returns from the decode method starts and ends with one quote. I guess this is because of the double quotes in the JSONs themselves. Is it related in any way? An example:

'{"type":"event","timestamp":"2018-03-25 09:19:50.999","event":"ButtonClicked","params":{"screen":"MainScreen","button":"SettingsButton"}}'

Thanks for any help!

Fine
  • 2,114
  • 1
  • 12
  • 18
Oren_C
  • 565
  • 7
  • 22
  • 1
    Your first code example consists of decoding body into a variable `log`, but you read json-lines from the variable `msg`. Is this a typo in question or your actual code? – Fine Apr 10 '18 at 12:41
  • Typo in question. fixing it. Thanks Fian! :) – Oren_C Apr 10 '18 at 12:52
  • 1
    Another possible thing: [jsonlines.Reader](https://jsonlines.readthedocs.io/en/latest/#jsonlines.Reader) accepts iterable as an arg ("The first argument must be an iterable that yields JSON encoded strings" not json-encoded single string as in your example), but, after `.decode("utf-8")`, log would be a string, which happen to support iterable interface. So when reader call under the hood `next(log)` it will get first item of log sting i.e. character `{` and will try to process it as an json-line which would be obviously invalid. Try `log = log.split()` before passing log to the Reader, – Fine Apr 10 '18 at 12:55
  • The split was the thing that I was missing! Thank you so much Fian! Could you add this as a full answer so I could tag it as the correct one? – Oren_C Apr 10 '18 at 13:10

1 Answers1

3

jsonlines.Reader accepts iterable as an arg ("The first argument must be an iterable that yields JSON encoded strings" not json-encoded single string as in your example), but, after .decode("utf-8"), log would be a string, which happen to support iterable interface. So when reader calls under the hood next(log) it will get first item of a log string, i.e. character { and will try to process it as an json-line which would be obviously invalid. Try log = log.split() before passing log to the Reader.

Fine
  • 2,114
  • 1
  • 12
  • 18
  • 1
    just a short comment about the split. what worked for me is specifically splitting via line separator. log.split('\n') is what fixed the problem for me. Otherwise, splitting without specifically defining the delimiter is splitting by whitespaces. – Oren_C Apr 10 '18 at 16:23
  • 1
    even better (b/c more generic) would be .splitlines() but what you have also works – wouter bolsterlee Jan 31 '19 at 18:09