I am generating a large number of elasticsearch documents with random content using python and index them with elasticsearch-py.
Simplified working example (document with just one field):
from elasticsearch import Elasticsearch
from random import getrandbits
es_client = Elasticsearch('https://elastic.host:9200')
for i in range(1,10000000):
document = {'my_field': getrandbits(64)}
es_client.index(index='my_index', document=document)
Since this makes one request per document, I tried to speed it up by sending chunks of 1000 documents each using the _bulk
API. However, my attempts so far have been unsuccessful.
My understanding from the docs is that you can pass an iterable to bulk()
, so I tried:
from elasticsearch import Elasticsearch
from random import getrandbits
es_client = Elasticsearch('https://elastic.host:9200')
document_list = []
for i in range(1,10000000):
document = {'my_field': getrandbits(64)}
document_list.append(document)
if i % 1000 == 0:
es_client.bulk(operations=document_list, index='my_index')
document_list = []
but this results in a
elasticsearch.BadRequestError: BadRequestError(400, 'illegal_argument_exception', 'Malformed action/metadata line [1], expected START_OBJECT or END_OBJECT but found [VALUE_STRING]')