Can we do Parallel Streaming, with IJSON for bulky JSON data Parsing/processing

Question

I am aware how IJSON is solving, the bulky JSON reading and processing challenges. However i am not able to find any article which specifies how to speed up this:

I have seen few things to achieve that 

 1. Use YAZL backend 
 2. Play with buff_size parameter(not seeing any significant improvement)

Question which I had in my mind or I guess many are already working on that:

Now if we want to utilize parallel processing power of the machine, will IJSON supports that.

I know at no point IJSON knows entire stream size, so splitting is out of question. My knowledge is quite limited in this area. Any thread, document or link would be a nice start for me to understand this more clearly.

Alternative suggestion: branchless JSON: https://github.com/simdjson/simdjson — Dai, Sep 23 '22 at 06:01
Parallel processing only benefits CPU-bound tasks: whereas things like reading files or streams from disk or network are IO-bound and so don’t benefit (after-all, what use is a fast JSON lib that can read 2GB/s of JSON if your disk can only read at 100MB/s?). — Dai, Sep 23 '22 at 06:04
So basically you’d need to already have a gargantuan JSON text blob already in-memory and pre-indexed so that different, separate, fragments of the JSON text can be read by many different processor cores concurrently. The problem with that is that simply having a single huge text JSON (not even BSON!) is a huge sign that something has gone seriously wrong somewhere: JSON is a serialisation format: _thats it_; it is not intended for, let alone optimised for in-memory structured data. — Dai, Sep 23 '22 at 06:07
@Dai: however, aiui, simdjson does read ndjson in parallel, by processing in two passes (the first one divides the input into separate JSONs). Their benchmarks claim that it's faster. I agree that it can't help much if you're limited to 100 MB/s, but SSD drives can be much faster and so can fast ethernet. So there probably exist contexts in which it could make sense. — rici, Sep 23 '22 at 18:05

Can we do Parallel Streaming, with IJSON for bulky JSON data Parsing/processing

0 Answers0