1

I have the below format and was hoping to pre-process in bulk using elasticsearch.

{"title":"April","url":"https://simple.wikipedia.org/wiki/April", "abstract":"April is the 4th month of the year, and comes between March and May. It is one of four months to have 30 days.","sections":["The Month","April in poetry","Events in April","Fixed Events","Moveable Events","Selection of Historical Events","Trivia","References"]}
{"title":"August","url":"https://simple.wikipedia.org/wiki/August", "abstract":"August (Aug.) is the 8th month of the year in the Gregorian calendar, coming between July and September.","sections":["The Month","August observances","Fixed observances and events","Moveable and Monthlong events","Selection of Historical Events","Trivia","References"]}

I am trying to add the index,type line before each of my line.

{"index":{"_index":"myindex","_type":"wiki","_id":"1"}}

On reading prior posts I am using Kevin Marsh's post like below:

cat file.json jq -c '.[] | {"index": {"_index": "myindex", "_type": "wiki", "_id": .id}}, .' 

I am not using the pipe as I am trying to figure out the error that precedes that.I get an error jq:no such file or directory. I then used jq --version and get jq-1.5-1-a5b5cbe.

Any help is much appreciated.

Hatim Stovewala
  • 1,333
  • 10
  • 19
ESLearner
  • 87
  • 1
  • 14

2 Answers2

1

Here you go. This worked for me. Let me know if this helps.

cat data.json | jq -c '. | {"index": {"_index": "json", "_type": "json"}}, .'  | curl -XPOST localhost:9200/_bulk --data-binary @-

Learn more about jq : a lightweight and flexible command-line JSON processor.

Hatim Stovewala
  • 1,333
  • 10
  • 19
  • Will absolutely format going forward. Sorry about that. Hatim used the below command in the format you gave: – ESLearner Oct 18 '17 at 04:04
  • cat data.json | jq -c '. | {"index": {"_index": "json", "_type": "json"}}, .' | curl -XPOST localhost:9200/_bulk --data-binary @- I get the below error : jq :error :syntax error , unexpected $end (unix shell quoting issues) at , line 1 – ESLearner Oct 18 '17 at 04:09
  • Forget about the curl part. Try this "cat data.json | jq -c '. | {"index": {"_index": "json", "_type": "json"}}, .' " and see what output you get. – Hatim Stovewala Oct 18 '17 at 05:46
  • Hatim - You are a genius ! You made me think on what I was doing wrong , I retained the original format and ran for first 10 rows and it retrieved them successfully . I went head and the created the index as well. What i am doing now is letting the command run for the 1000 odd rows that I have. It seems to be working on them slowly... will post and let u know...Appreciate all your help... – ESLearner Oct 18 '17 at 06:47
  • Hmmm , got an error after the command processed about 20K lines. The error is : curl:(56) Recv failure : Connection reset by peer – ESLearner Oct 18 '17 at 13:34
0

We're found it is necessary to specify Content-Type in the curl header; the suggested solution should be of the form:

cat data.json | jq -c '. | {"index": {"_index": "json", "_type": "json"}}, .' | curl -H "Content-Type: application/json" -XPOST localhost:9200/_bulk --data-binary @-