0

I want to use approximate KNN search in Elasticsearch. This requires some fields to have a dense vector mapping type with additional parameters. I would like to use eland to upload CSVs of my data, including the embeddings.

For example, I have the following data frame.

f = DataFrame({"text":["blue square", "red triangle"], "embedding":[[1.0, 2.0], [3.0, 4.0]]})
           text   embedding
0   blue square  [1.0, 2.0]
1  red triangle  [3.0, 4.0]

I want to run pandas_to_eland(f, ...) to create a new index containing the data in the CSV. I want the "embedding" field to be a dense vector that can be used by KNN search, which requires that its mapping look like this:

"embedding": {
        "type": "dense_vector",
        "dims": 2,
        "index": true,
        "similarity": "cosine" 
      }

Obviously I could just manually create the mappings for the entire CSV before indexing the data, but that will get tedious for larger data frames. I'd like to have eland/Elasticsearch figure out all the mappings and customize my "embedding" column in code at the time the index is created.

pandas_to_eland(f, ...) does have an es_type_overrides parameter that allows you to specify a mapping data type, but it does not enable any further mapping customization.

Is there some way to get eland to do this without manually creating the entire mapping beforehand?

W.P. McNeill
  • 16,336
  • 12
  • 75
  • 111

0 Answers0