Dumping json data into elasticsearch through python

Question

I'm trying to dump some json data that I parsed from a csv file into elasticsearch. I've checked the data and its the right format. But for some reason I'm getting the error below.

{
  "error" : {
    "root_cause" : [ {
      "type" : "parse_exception",
      "reason" : "Failed to derive xcontent"
    } ],
    "type" : "parse_exception",
    "reason" : "Failed to derive xcontent"
  },
  "status" : 400
}

This is the command I'm using. I'm not sure why its not working

curl -XPUT 'http://192.168.99.100:9200/_bulk?pretty' --data-binary "@data.json"

edit: This is part of my json data. I understand that the error is probably from this but I cannot tell what it is. I've read through the Bulk_API on elasticsearch.

{"index": {"_index": "jobs", "_id": 119556, "_type": "2014_jobs"}}
{"job category": "Logistics / Supply Chain|Purchasing / Merchandising|Human resource consultancy services|Full Time|Executive|Manager|Middle Management|", "closing date": "28-Dec-14", "salary": "Not published", "posting date": "28-Nov-14", "working hours": "-", "company": "", "contact": "+65 66454545 / ctay sg. drakeintl. com", "description": "", "title": "Logistics Category Manager", "job level": "Executive|Manager|Middle Management", "shift pattern": "Day Shift", "job id": "JOB-2014-0119556", "industry": "Human resource consultancy services", "employment type": "Full Time", "min years of experience": "8", "skills": "", "timestamp": "1.41973E+12", "address": "1 RAFFLES PLACE| 20-01 ONE RAFFLES PLACE||Singapore 048616|"}
{"index": {"_index": "jobs", "_id": 119700, "_type": "2014_jobs"}}
{"job category": "F B|Hospitality|Logistics / Supply Chain|Purchasing / Merchandising|Hotels with restaurant|Full Time|Fresh/entry level|Non-executive|", "closing date": "28-Dec-14", "salary": "Not published", "posting date": "28-Nov-14", "working hours": "44 hours a week", "company": "DAISHO DEVELOPMENT SINGAPORE PTE LTD", "contact": "Not available", "description": "", "title": "Culinary Logistic Agent", "job level": "Fresh/entry level|Non-executive", "shift pattern": "No Shift", "job id": "JOB-2014-0119700", "industry": "Hotels with restaurant", "employment type": "Full Time", "min years of experience": "1", "skills": "", "timestamp": "1.41973E+12", "address": "12 MARINA VIEW| 35-00 ASIA SQUARE TOWER 2||Singapore 018961|"}
{"index": {"_index": "jobs", "_id": 118701, "_type": "2014_jobs"}}
{"job category": "Architecture / Interior Design|Architectural services|Permanent|", "closing date": "30-Dec-14", "salary": "Not published", "posting date": "27-Nov-14", "working hours": "8am - 6pm", "company": "LOOK ARCHITECTS PTE. LTD. ", "contact": "jasmin lookarchitects. com", "description": "", "title": "Resident Technical Officer (RTO)", "job level": "-", "shift pattern": "No Shift", "job id": "JOB-2014-0118701", "industry": "Architectural services", "employment type": "Permanent", "min years of experience": "5", "skills": "", "timestamp": "1.41973E+12", "address": "18 BOON LAY WAY| 09-135 TRADEHUB 21||Singapore 609966|"}

You write "I've checked the data" but there is a `parse_exception` ... Please provide some sample code, and more importantly some sample data, otherwise it might be very hard for anyone to figure out what the problem really is. Right now I'd have to guess. — mbdevpl, May 26 '16 at 10:20
Hey, yeah i realized that a while after posting this. I'll post some code excerpts from my json data. I can't seem to figure out what is wrong. I've read through the bulk_api and it follows their conventions. — Kausik Venkat, May 27 '16 at 01:55
The data seems correct, does https://stackoverflow.com/questions/37457267/dumping-json-data-into-elasticsearch-through-python#37473739 my answer fix the problem? — mbdevpl, May 27 '16 at 02:19
Hey, that doesn't solve the issue for some reason. I've also been getting a different error this time. "reason" : "Malformed content, found extra data after parsing: START_OBJECT" — Kausik Venkat, May 27 '16 at 02:24
the error sounds like the json data is the issue but I'm generating it through a python script so i don't see why there might be issues if the first few lines are fine :S — Kausik Venkat, May 27 '16 at 02:25
Did you try curl with the exact sample above? Also, python script might generate malformed data somehow, if there is a bug in it. Did you try running curl with some hand-written very simple examples? Did they work? — mbdevpl, May 27 '16 at 03:17
It worked for other simple examples. I believe the error is in the way I wrote my JSON file. I'm looking into that now. — Kausik Venkat, May 27 '16 at 07:11

score 0 · Answer 1 · answered May 26 '16 at 10:08

0

try to replace the _bulk with _update

curl -XPUT 'http://192.168.99.100:9200/_update?pretty' --data-binary "@data.json"

Reference: https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-update.html

answered May 26 '16 at 10:08

Yaron

10,166
9
45
65

update didn't work either. I think the error is in the format of my json objects. – Kausik Venkat May 27 '16 at 01:54

score 0 · Answer 2 · edited May 23 '17 at 10:28

The error you're getting is the same as in another question in which the problem was that the full path to the data file was not provided: link to the answer to that question.

In that case changing from:

curl -XPUT 'http://192.168.99.100:9200/_bulk?pretty' --data-binary "@data.json"

to:

curl -XPUT 'http://192.168.99.100:9200/_bulk?pretty' --data-binary "@/full/path/to/data.json"

Should fix the problem.

Dumping json data into elasticsearch through python

2 Answers2