Get all documents via elasticsearch-py scan

Question

I have a small problem regarding elasticsearch-py. I need to access data from Elastic in Python. I do this via the search() api:

def get_data_from_elastic():
    query = {
       "query": {
            "match_all": {}
        }
    }
    

    rel = scan(client=client,
    query = query,
    scroll = '1m',  
    index=['index-1', 'index-2', 'index-3'], 
    raise_on_error=True,
    preserve_order=False,
    clear_scroll=True)

    

    result = list(rel)


    temp = []
    

    for hit in result:
        temp.append(hit['_source'])

    df = pd.DataFrame(temp)

    return df

df = get_data_from_elastic()

The function works but I have over 100 indexes with over 200m hits which I want to access.

With the query, it takes me over half an hour for just 10m of data.

Is there any way to make my API request more performant? Or is the request of such a large amount of data not possible at all?

Another question: After the API request the data should be filtered directly in my python script and the number of hits would be almost halved. This filter script is also implementable in Elastic-Kibana. Is there any way to possibly create an index from it in Elastic and request it via the API?

Thanks a lot

i tried to load all hits from elastic into a dataframe

Get all documents via elasticsearch-py scan

0 Answers0