4

Im trying to upload a 800GB file to elasticsearch but i keep getting a memory error that tells me the data binary is out of memory. I have 64GB of RAM on my system and 3TB of storage

curl -XPOST 'http://localhost:9200/carrier/doc/1/_bulk' --data-binary @carrier.json

Im wondering if there is a setting in the config file to increase to amount of memory so i can upload to his file

thanks

Taimoor Khan
  • 585
  • 2
  • 6
  • 15

1 Answers1

8

800GB is a quite a lot to send in one shot, ES has to put all the content into memory in order to process it, so that's probably too big for the amount of memory you have.

One way around this is to split your file into several and send each one after another. You can achieve it with a small shell script like the one below.

#!/bin/sh

# split the main file into files containing 10,000 lines max
split -l 10000 -a 10 carrier.json /tmp/carrier_bulk

# send each split file
BULK_FILES=/tmp/carrier_bulk*
for f in $BULK_FILES; do
    curl -s -XPOST http://localhost:9200/_bulk --data-binary @$f
done

UPDATE

If you want to interpret the ES response you can do so easily by piping the response to a small python one-liner like this:

curl -s -XPOST $ES_HOST/_bulk --data-binary @$f | python -c 'import json,sys;obj=json.load(sys.stdin);print "    <- Took %s ms with errors: %s" % (obj["took"], obj["errors"])';
Val
  • 207,596
  • 13
  • 358
  • 360
  • Very nice example! Is it by any easy way possible to exclusively output errors (for when memory runs out even when split into 10000 lines)? By default, ES returns fully detailed output of results making it impossible to search for errors. – Gabe Hiemstra Feb 11 '17 at 16:05
  • @GabeHiemstra Sure, I've updated my answer with an example – Val Feb 11 '17 at 17:49