3

I read tutorial about indexing documents in Elasticsearch. There is example with bulk indexing. I have question, is it correct when is created array with two key for one item in the loop:

for($i = 0; $i < 100; $i++) {
    $params['body'][] = array(
        'index' => array(
            '_id' => $i
        )
    );

    $params['body'][] = array(
        'my_field' => 'my_value',
        'second_field' => 'some more values'
    );
}

Why there are two initizalizations of array $params['body'][] in loop? Must be index setting by the same key like as my_field?

I mean one case, when all information about index is added in array by one key(index):

$params['body'][] = array(
            'index' => array(
                '_id' => $i
            ),

            'my_field' => 'my_value',
            'second_field' => 'some more values'
        );

Also after search query I get error:

Message: Illegal string offset 'match' on line where is:

$query['match']['name'] = $query;

where $query is string.

I suppose that this error with problem in creation of index, therefore I have begun with this.

My code which adds document in index:

private function addDocument($data = array(), $type)
    {
        if (!empty($data)) {
            foreach ($data as $key => $val) {
                $params['body'][] = array(
                    'index' => array(
                        '_id' => $key,
                        '_type' => 'profiles',
                        '_index' => $this->_typeIndex($type)
                    )
                );

                $params['body'][] = (array)$val;
            }

            $this->client->bulk($params);
        }

    }

Is it right? Because in search I get error, that described here

Babaev
  • 101
  • 10

1 Answers1

4

In order for bulk indexing to work, the payload must contains one command (index, type, id of the document) line and one content line (actual fields of the document) per document, like this:

{"index": {"_id": "1234"}}               <--- command for doc1
{"field1": "value1", "field2": "value2"}  <--- source for doc1
{"index": {"_id": "1234"}}               <--- command for doc2
{"field1": "value1", "field2": "value2"}  <--- source for doc2
...

The PHP example you cited does exactly this:

$params['body'][] = array(
    'index' => array(
        '_id' => $i
    )
);

will create the first command line reading {"index": {"_id": "0"}} and

$params['body'][] = array(
    'my_field' => 'my_value',
    'second_field' => 'some more values'
);

will create the second content line reading {"my_field": "my_value", "second_field": "some more values"}

The for loop does this 100x and will create a payload containing 200 lines for 100 documents.

If you concatenate the body like you did with

$params['body'][] = array(
        'index' => array(
            '_id' => $i
        ),

        'my_field' => 'my_value',
        'second_field' => 'some more values'
    );

It will not work because that will produce a single line per document reading like this:

{"index":{"_id": "0"}, "my_field": "my_value", "second_field": "some more values"}

And the bulk operation will fail...

Try again.

UPDATE

It is not working because you're adding too many lines. You should remove the foreach and simply do it like this. I'm just not sure how your id field is called. Also I suppose the $data array contains the fields of the document to add.

private function addDocument($data = array(), $type)
    {
        if (!empty($data)) {
            $params['body'][] = array(
                'index' => array(
                    '_id' => $data['id'],    <--- make sure to use the right id field
                    '_type' => 'profiles',
                    '_index' => $this->_typeIndex($type)
                )
            );
            $params['body'][] = $data;

            $this->client->bulk($params);
        }
    }
Val
  • 207,596
  • 13
  • 358
  • 360