0

I am aware how IJSON is solving, the bulky JSON reading and processing challenges. However i am not able to find any article which specifies how to speed up this:

I have seen few things to achieve that 

 1. Use YAZL backend 
 2. Play with buff_size parameter(not seeing any significant improvement)

Question which I had in my mind or I guess many are already working on that:

  • Now if we want to utilize parallel processing power of the machine, will IJSON supports that.

I know at no point IJSON knows entire stream size, so splitting is out of question. My knowledge is quite limited in this area. Any thread, document or link would be a nice start for me to understand this more clearly.

Shubham Chauhan
  • 119
  • 2
  • 14
  • Alternative suggestion: branchless JSON: https://github.com/simdjson/simdjson – Dai Sep 23 '22 at 06:01
  • Parallel processing only benefits CPU-bound tasks: whereas things like reading files or streams from disk or network are IO-bound and so don’t benefit (after-all, what use is a fast JSON lib that can read 2GB/s of JSON if your disk can only read at 100MB/s?). – Dai Sep 23 '22 at 06:04
  • So basically you’d need to already have a gargantuan JSON text blob already in-memory and pre-indexed so that different, separate, fragments of the JSON text can be read by many different processor cores concurrently. The problem with that is that simply having a single huge text JSON (not even BSON!) is a huge sign that something has gone seriously wrong somewhere: JSON is a serialisation format: _thats it_; it is not intended for, let alone optimised for in-memory structured data. – Dai Sep 23 '22 at 06:07
  • 1
    @Dai: however, aiui, simdjson does read ndjson in parallel, by processing in two passes (the first one divides the input into separate JSONs). Their benchmarks claim that it's faster. I agree that it can't help much if you're limited to 100 MB/s, but SSD drives can be much faster and so can fast ethernet. So there probably exist contexts in which it could make sense. – rici Sep 23 '22 at 18:05

0 Answers0