Elasticsearch python bulking parallel issue

Question

I am attempting to bulk index a generator using the parallel bulk method from elasticsearch helpers in python, however it seems that this method doesn't perform anything. If I use the regular bulk method, the ingestion in to elasticsearch runs just fine. I have looked this problem up and came across with this solution : https://discuss.elastic.co/t/helpers-parallel-bulk-in-python-not-working/39498 which I tried (expects the generator to be consumed) but it seems it still doesn't function. No error is outputted, and the iterator is not being consumed, this is my code:

@staticmethod
    def fetch_rows(cursor):
        frame = cursor.fetchone()
        while frame is not None:
            yield frame
            frame = cursor.fetchone()

@staticmethod
def __generate_field(body):
    """
    Takes an action and creates an iterator element json join body
    :param body: adds json body to generator
    :return: item iterator
    """
    for item in body:
        yield item

def json_for_bulk_body_sql_list(self, body, index_name: str, name_of_docs: str):
    """
    :param body: List that will be made as a generator
    :param index_name : name of the index based on location
    :param name_of_docs : name of the docs that you want of in the index
    :return: Structured JSON file for bulking
    """

    # if not isinstance(body, list):
    #     raise TypeError('Body must be a list')
    if not isinstance(index_name, str):
        raise TypeError('index must be a string')

    structured_json_body = ({
        '_op_type': 'index',
        '_index': index_name,  # index name Twitter
        '_type': name_of_docs,  # type is tweet
        '_id': doc['tweet_id'],  # id of the tweet
        '_source': doc
    } for doc in self.__generate_field(body))
    return structured_json_body


json_results = (dict(zip(column_names, row)) for row in self.fetch_rows(cursor))
actions = (self.json_for_bulk_body_sql_list(json_results, index_name=index_, name_of_docs=doc_name))

for success, info in self.bulk_es_parallel(actions=actions):
    if not success:
        print('Doc failed: '.upper(), info)
    else:
        ingested += 1

I am doing the same exact thing that the example from the solution url is saying, but it still no ingestion into elastic search. Cant quite figure out why even after i debugged .

thank you so much !

Elasticsearch python bulking parallel issue

0 Answers0