I am trying to process jsonlines from an API and I am running into an issue where requests.iter_lines() is not timely. I have to now try to incorporate requests.iter_content(chunk_size=1024*1024). I am trying to work through the logic I would need to take an incomplete jsonline[1] and attach it to the next chunk_size so it makes a complete one.
My current attempt is running a series of if statements against to detect an undesirable state [2] and then rebuild it and continue process, but i'm failing to reassemble it in all the various states this could end up in. Does someone have an example of a well thought out solution to this problem?
[1]
Example:
Last item from first chunk:
{'test1': 'value1', 'test2': 'valu
first item from second chunk:
e2', 'test3': 'value3'}
[2] def incomplete_processor(main_chunk):
if not main_chunk[0].startswith('{') and not main_chunk[-1].endswith('\n'):
first_line = str(main_chunk[0])
last_line = str(main_chunk[-1])
main_chunk.pop(0)
main_chunk.pop(-1)
return first_line, last_line
if not main_chunk.startswith('{') and main_chunk[-1].endswith('\n'):
first_line = str(main_chunk[-1])
main_chunk.pop(0)
return first_line
if main_chunk.startswith('{') and not main_chunk[-1].endswith('\n'):
last_line = str(main_chunk[-1])
main_chunk.pop(-1)
return last_line