-1

Is there any way to import data from a JSON file into elasticSearch without having to provide ID to each document?

I have some data in a JSON file. It contains around 1000 documents but no ID has been specified for any document. Here's how the data looks like:


{"business_id": "aasd231as", "full_address": "202 McClure 15034", "hours":{}}
{"business_id": "123123444", "full_address": "1322 lure 34", "hours": {}}
{"business_id": "sd231as", "full_address": "2 McCl 5034", "hours": {}}

It does not have {"index":{"_id":"5"}} before any document. Now I am trying to import the data into elasticsearch using the following command:

curl -XPOST localhost:9200/newindex/newtype/_bulk?pretty --data-binary @path/file.json

But it throws the following error:

"type" : "illegal_argument_exception",
"reason" : "Malformed action/metadata line [1], expected START_OBJECT or END_OBJECT but found [VALUE_STRING]"

This is because of the absence of ID in line before each document.

Is there any way to import the data without providing {"index":{"_id":"5"}} before each document. Any help will be highly appreciated!!

f_puras
  • 2,521
  • 4
  • 33
  • 38
Anoop Sharma
  • 87
  • 1
  • 13

2 Answers2

0

How about using Logstash which is perfectly suited for this task. Just use the following config file and you're done:

Save the following config in logstash.conf:

input {
  file {
   path => "/path/to/file.json"
   start_position => "beginning"
   sincedb_path => "/dev/null"
   codec => "json"
  }
}
filter {
 mutate {
  remove_field => [ "@version", "@timestamp", "path", "host" ]
 }
}
output {
 elasticsearch {
   hosts => ["localhost:9200"]
   index => "newindex"
   document_type => "newtype"
   workers => 1
 }
}

Then start logstash with

bin/logstash -f logstash.conf
Val
  • 207,596
  • 13
  • 358
  • 360
  • I tried using the config in logstash but it throws this error: io/console not supported; tty will not be manipulated The signal HUP is in use by the JVM and will not work correctly on this platform – Anoop Sharma Jun 28 '16 at 11:58
  • error: io/console not supported; tty will not be manipulated The signal HUP is in use by the JVM and will not work correctly on this platform – Anoop Sharma Jun 28 '16 at 11:59
  • What command are you running and what version of Logstash do you have? Make sure to save the above config in a file `logstash.conf` and then run `bin/logstash -f logstash.conf` – Val Jun 28 '16 at 12:03
  • The error got resolved. Now it says, Pipeline Main Started. Has it loaded the data? How do i access my data? – Anoop Sharma Jun 28 '16 at 12:09
  • No error anymore is a good sign. Go check in your ES if you got some new data at `curl -XGET localhost:9200/newindex/newtype/_search` – Val Jun 28 '16 at 12:10
  • Thanks! That worked. The data is imported in ES. But its not directly under the "_source". Its inside a "_message". Also, all the field names and values are enclosed between "\". How do i resolve that? – Anoop Sharma Jun 28 '16 at 12:18
  • My bad, I forgot to specify `codec => "json"`. Delete your index, fix the config as in my answer above and restart logstash. – Val Jun 28 '16 at 12:20
  • Hey. Just one last thing. I am trying to upload a 2GB data file using the same procedure. The logstash runs without errors. But the data is not showing up in the ES. Any idea? – Anoop Sharma Jun 28 '16 at 12:33
  • Try to rename your JSON file and adapt the config fiel accordingly. Also do you see any errors in the elasticsearch log? – Val Jun 28 '16 at 12:55
  • No, there are no errors in the ES log. The created index just doesnt show up in ES. – Anoop Sharma Jun 28 '16 at 13:03
  • Weird, what changed since you made it worked earlier? – Val Jun 28 '16 at 13:04
  • File size is the only change. I am trying it now on a 2GB JSON file. Is it possible that it is getting processed at the backend and will take sometime to show up in the ES indices? – Anoop Sharma Jun 28 '16 at 13:10
  • You're 100% that your file contains one JSON document per line with no newlines within documents (i.e. one document is not spanning multiple lines)? – Val Jun 28 '16 at 13:13
  • I have checked it for a first few lines and it holds true. But I can't really say that about the whole file since there are a lot of documents in it. – Anoop Sharma Jun 28 '16 at 13:19
  • I'd say try a few dozens documents, then a few hundreds, etc... Hard to say what is going on. – Val Jun 28 '16 at 13:23
  • Cool. Will try that. Thanks!! – Anoop Sharma Jun 28 '16 at 13:25
  • Yes. Actually it seems to be happening because of the size of dataset. When I split it into 4 datasets and import, it works fine. – Anoop Sharma Jun 29 '16 at 10:18
  • Awesome, glad you figured it out. – Val Jun 29 '16 at 10:39
0

Another option, perhaps the easier one since you are not filtering data is to use filebeat. Latest filebeat-5.0.0-alpha3 has JSON shipper. Here is a sample

Sahas
  • 3,046
  • 6
  • 32
  • 53