1

I am trying to use the elasticsearch bulk api to insert multiple records into an index. My JSON looks something like this: request json

I am inserting a new line (\\n) at the end of the document but I'm still getting the newline error.

    Error: {
        "error": {
            "root_cause": [
                {
                    "type": "illegal_argument_exception",
                    "reason": "The bulk request must be terminated by a newline [\n]"
                }
            ],
            "type": "illegal_argument_exception",
            "reason": "The bulk request must be terminated by a newline [\n]"
        },
        "status": 400
    }

Joe - GMapsBook.com
  • 15,787
  • 4
  • 23
  • 68
  • It is difficult to copy from image can you paste JSON as well. –  Apr 12 '20 at 10:52
  • I have tried with the kind of data you have given . Created a .json file and this is the content that is present in it {"index":{}} {"tags":["ab","cd"],"question":"test this","answer":"answer first"} {"index":{}} {"tags":["de","fg"],"question":"test second","answer":"answer second"} which is similar to your content and BULK API runs fine. Can you add more detail as to how you are hitting this API by pasting CURL request –  Apr 12 '20 at 11:12
  • I am using elasticsearch service on AWS through a lambda function and trying to post this data using aws NodeHttpClient. Not sure how the request body would actually look like in this particular scenario. – Prashant Agarwal Apr 12 '20 at 11:43

2 Answers2

3

Based on my previous answer and https://stackoverflow.com/a/50754789/8160318:

const AWS = require('aws-sdk');
const creds = new AWS.EnvironmentCredentials('AWS');

const INDEX_NAME = 'index_name';

const esDomain = {
    region: 'us-east-1',
    endpoint: 'yoursearchdomain.region.amazonaws.com',
    index: 'myindex',
    doctype: 'mytype'
};

const endpoint = new AWS.Endpoint(esDomain.endpoint);
const req = new AWS.HttpRequest(endpoint);


const docs_as_body_params = JSON.parse(
    '[' +
    `{"index":{}} {"tags":["ab","cd"],"question":"test this","answer":"answer first"} {"index":{}} {"tags":["de","fg"],"question":"test second","answer":"answer second"}`.split(
        /(\s?{"index":{}} )/g
    )
    .filter(match => match.length)
    .filter((_, index) => index % 2 !== 0) +
    ']'
);

const bulk_body = [];
docs_as_body_params.map((doc) => {
    bulk_body.push({
        index: {
            _index: INDEX_NAME,
            _id: doc.id || null
        }
    });
    bulk_body.push(doc);
});

/// THE MOST IMPORTANT PART -- getting to a valid ndjson
const ndjson_payload = bulk_body.map(JSON.stringify).join('\n') + '\n'

req.method = 'POST';
req.path = '_bulk'
req.region = esDomain.region;
req.headers['presigned-expires'] = false;
req.headers['Host'] = endpoint.host;
req.headers['Content-Type'] = 'application/json';
req.body = ndjson_payload;

var signer = new AWS.Signers.V4(req, 'es');
signer.addAuthorization(creds, new Date());

var send = new AWS.NodeHttpClient();
send.handleRequest(req, null, function (httpResp) {
    var respBody = '';
    httpResp.on('data', function (chunk) {
        respBody += chunk;
    });
    httpResp.on('end', function (chunk) {
        console.log('Response: ' + respBody);
        context.succeed('Lambda added document ' + doc);
    });
}, function (err) {
    console.log('Error: ' + err);
    context.fail('Lambda failed with error ' + err);
});
Joe - GMapsBook.com
  • 15,787
  • 4
  • 23
  • 68
0

Your json was nd-json (new-line-delimited) JSON at some point but looks all messed up now so we'll have to do some cleanup beforehand.

Initialize:

const {
    Client
} = require("@elastic/elasticsearch");

const client = new Client({
    node: 'http://localhost:9200'
});

const INDEX_NAME = 'index_name';

Convert the would-be ndjson into a consumable array or objects:

const docs_as_body_params = JSON.parse(
    '[' +
    `{"index":{}} {"tags":["ab","cd"],"question":"test this","answer":"answer first"} {"index":{}} {"tags":["de","fg"],"question":"test second","answer":"answer second"}`.split(
        /(\s?{"index":{}} )/g
    )
    // filter out empty strings
    .filter(match => match.length)
    // take every odd member (skipping `{"index":{}}`)
    .filter((_, index) => index % 2 !== 0) +
    ']'
);

Construct the bulk body

const bulk_body = [];
docs_as_body_params.map((doc) => {
    bulk_body.push({
        index: {
            _index: INDEX_NAME,
            _id: doc.id || null
        }
    });
    bulk_body.push(doc);
});

Perform bulk indexing:

client.bulk({
        body: bulk_body
    },
    (err, resp) => {
        if (err || resp.errors) {
            console.err(err || resp.errors)
        }
        console.info(resp.body.items);
    }
);
Joe - GMapsBook.com
  • 15,787
  • 4
  • 23
  • 68
  • Hi @jzzfs, thanks for the detailed answer. I am using AWS as elasticsearch service provider and using AWS Lambda to perform tasks on elasticsearch. This approach seems perfect for local setup of ES. But I have to sign the role before performing any task using AWS provided role's credentials and the HTTP request (AWS.Signers.V4(req, "es")). Instead of using ES Client for Node.js can it be done using AWS NodeHTTPClient ? It just needs to have a valid nd-json to be used in the request body. – Prashant Agarwal Apr 12 '20 at 14:12
  • I'll post a new answer because that's a completely different question... – Joe - GMapsBook.com Apr 12 '20 at 17:30