I want to use approximate KNN search in Elasticsearch. This requires some fields to have a dense vector mapping type with additional parameters. I would like to use eland to upload CSVs of my data, including the embeddings.
For example, I have the following data frame.
f = DataFrame({"text":["blue square", "red triangle"], "embedding":[[1.0, 2.0], [3.0, 4.0]]})
text embedding
0 blue square [1.0, 2.0]
1 red triangle [3.0, 4.0]
I want to run pandas_to_eland(f, ...)
to create a new index containing the data in the CSV. I want the "embedding" field to be a dense vector that can be used by KNN search, which requires that its mapping look like this:
"embedding": {
"type": "dense_vector",
"dims": 2,
"index": true,
"similarity": "cosine"
}
Obviously I could just manually create the mappings for the entire CSV before indexing the data, but that will get tedious for larger data frames. I'd like to have eland/Elasticsearch figure out all the mappings and customize my "embedding" column in code at the time the index is created.
pandas_to_eland(f, ...)
does have an es_type_overrides
parameter that allows you to specify a mapping data type, but it does not enable any further mapping customization.
Is there some way to get eland to do this without manually creating the entire mapping beforehand?