3

I am trying to index some documents using the elasticsearch.helpers.streaming_bulk function. When I try to get the results using the examples from here, I get the error: TypeError: 'function' object is not iterable.

This is my function:

import elasticsearch.helpers
from elasticsearch import Elasticsearch

def index_with_streaming_bulk(self):

    all_body = []

    with open(self.geonames_file, encoding='utf-8') as csvfile:

        reader = csv.reader(csvfile, delimiter='\t')
        body = []
        next(reader)  # skip column names
        for row_ind, row in enumerate(reader):
            body.append({
                "index": {
                    "_id": row_ind+1  # to map index value to geonames. remove the column headers
                }
            })
            doc = {}

            for field_tup in self.included_cols:
                field_name = field_tup[0]
                field_ind = field_tup[1]
                field_type = field_tup[2]
                val_init = row[field_ind]

                mod_val = self.transform_value(field_type, val_init)
                doc[field_name] = mod_val

            body.append(doc)
            all_body.append(body)

    def gendata():
        for body in all_body:
            yield body

    res = elasticsearch.helpers.streaming_bulk(client=es, actions=gendata, chunk_size=500,
                                                             max_retries=5, initial_backoff=2, max_backoff=600,
                                               request_timeout=20)

    for ok, response in res:
        print(ok, response)

EDIT: This is the full stack trace:

"C:\Program Files\ArcGIS\Pro\bin\Python\envs\arcgispro-py3\python.exe" C:/Users/admin/PycharmProjects/ElasticSearch/ES_Indexer_Geonames.py
Traceback (most recent call last):
  File "C:/Users/admin/PycharmProjects/ElasticSearch/ES_Indexer_Geonames.py", line 267, in <module>
    Indexer(init_hydro_concat, index_name, doc_name).index_with_streaming_bulk()
  File "C:/Users/admin/PycharmProjects/ElasticSearch/ES_Indexer_Geonames.py", line 207, in index_with_streaming_bulk
    for ok, response in res:
  File "C:\Program Files\ArcGIS\Pro\bin\Python\envs\arcgispro-py3\lib\site-packages\elasticsearch\helpers\__init__.py", line 176, in streaming_bulk
    actions = map(expand_action_callback, actions)
TypeError: 'function' object is not iterable

Thanks for any help!

Litwos
  • 1,278
  • 4
  • 19
  • 44
  • Please paste full stack trace. Probably you used name of a function instead of it's result as one of parameters somewhere. But it's quite difficult to guess where without trace. – running.t Jul 16 '18 at 09:47
  • @running.t I put the whole stack trace – Litwos Jul 16 '18 at 10:29

2 Answers2

3

It was due to the construction of the body dict. I needed to create the body as a dict, and include all body dicts into a list. This is the solution:

def index_with_streaming_bulk(self):

    all_body = []

    with open(self.geonames_file, encoding='utf-8') as csvfile:

        reader = csv.reader(csvfile, delimiter='\t')
        body = {}
        next(reader)  # skip column names

        for row_ind, row in enumerate(reader):

            body['_index'] = self.index_name
            body['_type'] = self.doc_type
            body['_id'] = row_ind + 1  # to map index value to geonames. remove the column headers

            for field_tup in self.included_cols:
                field_name = field_tup[0]
                field_ind = field_tup[1]
                field_type = field_tup[2]
                val_init = row[field_ind]

                mod_val = self.transform_value(field_type, val_init)
                body[field_name] = mod_val

            all_body.append(body)
            body={}

    def gendata():
        for body in all_body:
            yield body

    res = elasticsearch.helpers.streaming_bulk(client=es, actions=all_body, chunk_size=1000, max_retries=5,
                                               initial_backoff=2, max_backoff=600, request_timeout=3600)
    for ok, response in res:
        print(ok, response)
Litwos
  • 1,278
  • 4
  • 19
  • 44
2

According to elasticsearch.helpers.streamin_bulk documentation actions parameter is an iterable containing the actions to be executed, but not a function generating this iterable.

I found several examples of usage of that function and in all cases value of actions parameter is a result of function not a function itself. So I believe in your case it should be:

   res = elasticsearch.helpers.streaming_bulk(client=es, actions=gendata(), chunk_size=500, max_retries=5, initial_backoff=2, max_backoff=600, request_timeout=20)

Note () after gendata which means this function is actually called and generator producing result is passed as an argument, not a function itself.

running.t
  • 5,329
  • 3
  • 32
  • 50
  • 2
    Thanks for the help. I tried that, but now I get this error: `TypeError: pop() takes at most 1 argument (2 given)` – Litwos Jul 16 '18 at 12:17