0

Which technology would be best to import large amount of large JSON Line format files (approx 2 GB per file).

I am thinking about Solr.

Once the data will be imported it will have to be query-able.

Which technology would you suggest to import and then query JSON line format data in a timely manner?

Community
  • 1
  • 1
DamianPawski
  • 375
  • 1
  • 10

1 Answers1

0

You can start prototyping with some scripting language you prefer, to read the lines, massage the format as needed to get valid Solr json and send it to Solr via HTTP. Would the faster to get going.

Longer term, SolrJ will allow you to get max perf (if you need to), as you can:

  1. hit the leader replica in a Solrcloud environment directly
  2. use multiple threads to ingest and send docs (you can also use multiple processes). Not that this is harder/impossible with all other technologies, but in some it is.
  3. you have the full flexibility of using all SolrJ api
Persimmonium
  • 15,593
  • 11
  • 47
  • 78