0

I am trying to process jsonlines from an API and I am running into an issue where requests.iter_lines() is not timely. I have to now try to incorporate requests.iter_content(chunk_size=1024*1024). I am trying to work through the logic I would need to take an incomplete jsonline[1] and attach it to the next chunk_size so it makes a complete one.

My current attempt is running a series of if statements against to detect an undesirable state [2] and then rebuild it and continue process, but i'm failing to reassemble it in all the various states this could end up in. Does someone have an example of a well thought out solution to this problem?

[1]

Example:

Last item from first chunk:

{'test1': 'value1', 'test2': 'valu

first item from second chunk:

e2', 'test3': 'value3'}

[2] def incomplete_processor(main_chunk):

    if not main_chunk[0].startswith('{') and not main_chunk[-1].endswith('\n'):
        first_line = str(main_chunk[0])
        last_line = str(main_chunk[-1])
        main_chunk.pop(0)
        main_chunk.pop(-1)

        return first_line, last_line

    if not main_chunk.startswith('{') and main_chunk[-1].endswith('\n'):
        first_line = str(main_chunk[-1])
        main_chunk.pop(0)

        return first_line

    if main_chunk.startswith('{') and not main_chunk[-1].endswith('\n'):
        last_line = str(main_chunk[-1])
        main_chunk.pop(-1)

        return last_line

1 Answers1

0

I solve this problem by converting my original rsplit('\n') into a deque and then caught any valueerrors raised as a result of the incomplete json. I stored the first value that errors out, waited for the next value to error out and then combined them.

while True:
    try:
        jsonline = main_chunk_deque.popleft()
        jsonline = json.loads(jsonline)
    except ValueError as VE:
        if not jsonline.endswith('}'):
            next_line = jsonline
        elif not jsonline.startswith('{'):
                first_line = jsonline
                jsonline = json.loads(next_line + first_line)
        continue
    except IndexError:
        break