1

I'm in need of a basic definition and usage expectation for ElasticSearch. I have an ever growing folder of CSV-delimited based data (in files).

Elasticsearch likes JSON. I get that, and I have the ability to convert them over with no issue.

What I need to know is this: does each CSV row need to be in it's own file.json file to be considered for indexing? is that what a document is? or do I bulk stack JSON entries into a single file and run them in for indexing? is the json entry a document? or the file.json a document as ElasticSearch sees it?

Thanks.

arcee123
  • 101
  • 9
  • 41
  • 118

1 Answers1

2

Basically, each CSV row is considered a document once turned into JSON. Now, you have a few options.

A. You can keep your CSV file as it is and use Logstash to consume it using a csv filter and send the resulting JSON documents to Elasticsearch.

B. You can transform your CSV file in another file, where each CSV row is turned into a one-liner JSON document, i.e.

Instead of

Col1,Col2,Col3
Cell11,Cell12,Cell13
Cell21,Cell22,Cell23

You have

{ "Col1": "Cell11", "Col2": "Cell12", "Col3": "Cell13" }
{ "Col1": "Cell21", "Col2": "Cell22", "Col3": "Cell23" }

But you'd still need to use Logstash in order to load that multi-JSON file into Elasticsearch

C. A last option is to transform the CSV file into a so-called bulk file that would look like this:

{ "index": {}}
{ "Col1": "Cell11", "Col2": "Cell12", "Col3": "Cell13" }
{ "index": {}}
{ "Col1": "Cell21", "Col2": "Cell22", "Col3": "Cell23" }

And then you can load that file using a single command via the Bulk API.

Val
  • 207,596
  • 13
  • 358
  • 360
  • Thanks VAL...this makes sense. sometimes the assumption of knowledge can kill alot of things. Good stuff. Thanks! – arcee123 May 04 '18 at 15:55