I have an ElasticSearch (v7.5.1) index with a dense_vector
field called lda
, with 150 dimensions. The mapping, as shown on http://localhost:9200/documents/_mapping, looks like this:
"documents": {
"mappings": {
[...]
"lda": {
"type":"dense_vector",
"dims":150
}
}
}
When I try to index a document through the Elasticsearch Client for Python (v7.1.0), ES throws this error message:
{"type": "server", "timestamp": "2020-01-03T08:40:04,962Z", "level": "DEBUG", "component": "o.e.a.b.TransportShardBulkAction", "cluster.name": "docker-cluster", "node.name": "8d468383f2cf", "message": "[documents][0] failed to execute bulk item
(create) index {[documents][document][S_uPam8BUsDzizMKxpRR], source[{\"id\":42129,[...],\
"lda\":[0.031139032915234566,0.02878846414387226,0.026767859235405922,0.025012295693159103,0.02347283624112606,0.022111890837550163,0.02090011164546013,0.019814245402812958,0.0188356414437294,0.01794915273785591,0.01714235544204712,0.01640496961772442,0.015728404745459557,0.
015105433762073517,0.014529934152960777,0.013996675610542297,0.013501172885298729,0.013039554469287395,0.012608458288013935,0.012204954400658607,0.011826476082205772,0.011470765806734562,0.011135827749967575,0.010819895192980766,0.01052139326930046,0.010238921269774437,0.0,0
.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,
0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0]}]}", "cluster.uuid": "7irLdTC_S7eXwYcVFolppQ", "node.id":
"M_fMZ3KxQnWP3AiguV1_jA" ,
"stacktrace": ["org.elasticsearch.index.mapper.MapperParsingException: The [dims] property must be specified for field [lda].", [22/1876]
"at org.elasticsearch.xpack.vectors.mapper.DenseVectorFieldMapper$TypeParser.parse(DenseVectorFieldMapper.java:104) ~[?:?]",
"at org.elasticsearch.index.mapper.DocumentParser.createBuilderFromFieldType(DocumentParser.java:680) ~[elasticsearch-7.5.1.jar:7.5.1]",
"at org.elasticsearch.index.mapper.DocumentParser.parseDynamicValue(DocumentParser.java:826) ~[elasticsearch-7.5.1.jar:7.5.1]",
"at org.elasticsearch.index.mapper.DocumentParser.parseValue(DocumentParser.java:619) ~[elasticsearch-7.5.1.jar:7.5.1]",
"at org.elasticsearch.index.mapper.DocumentParser.parseNonDynamicArray(DocumentParser.java:601) ~[elasticsearch-7.5.1.jar:7.5.1]",
"at org.elasticsearch.index.mapper.DocumentParser.parseArray(DocumentParser.java:560) ~[elasticsearch-7.5.1.jar:7.5.1]",
"at org.elasticsearch.index.mapper.DocumentParser.innerParseObject(DocumentParser.java:420) ~[elasticsearch-7.5.1.jar:7.5.1]",
"at org.elasticsearch.index.mapper.DocumentParser.parseObjectOrNested(DocumentParser.java:395) ~[elasticsearch-7.5.1.jar:7.5.1]",
"at org.elasticsearch.index.mapper.DocumentParser.internalParseDocument(DocumentParser.java:112) ~[elasticsearch-7.5.1.jar:7.5.1]",
"at org.elasticsearch.index.mapper.DocumentParser.parseDocument(DocumentParser.java:71) ~[elasticsearch-7.5.1.jar:7.5.1]",
"at org.elasticsearch.index.mapper.DocumentMapper.parse(DocumentMapper.java:267) ~[elasticsearch-7.5.1.jar:7.5.1]",
"at org.elasticsearch.index.shard.IndexShard.prepareIndex(IndexShard.java:791) ~[elasticsearch-7.5.1.jar:7.5.1]",
"at org.elasticsearch.index.shard.IndexShard.applyIndexOperation(IndexShard.java:768) ~[elasticsearch-7.5.1.jar:7.5.1]",
"at org.elasticsearch.index.shard.IndexShard.applyIndexOperationOnPrimary(IndexShard.java:740) ~[elasticsearch-7.5.1.jar:7.5.1]",
[...]
This is how documents are added to the index programmatically:
es = Elasticsearch(hosts="localhost:9200")
es.index(index=self.index, doc_type=doc_type, body=document_data)
Where document_data
is a dictionary, holding the data as shown in the error log above, including this:
{
[...]
"lda": [0.031139032915234566, ...]
}
The index was created immediately before, so no documents in there yet. I notice, when I created the index, there was this output:
{"type": "server", "timestamp": "2020-01-03T08:40:03,280Z", "level": "INFO", "component": "o.e.c.m.MetaDataCreateIndexService", "cluster.name": "docker-cluster", "node.name": "8d468383f2cf", "message": "[documents] creating index, cause [api],
templates [], shards [1]/[1], mappings [_doc]", "cluster.uuid": "7irLdTC_S7eXwYcVFolppQ", "node.id": "M_fMZ3KxQnWP3AiguV1_jA" }
{"type": "deprecation", "timestamp": "2020-01-03T08:40:04,940Z", "level": "WARN", "component": "o.e.d.r.a.d.RestDeleteAction", "cluster.name": "docker-cluster", "node.name": "8d468383f2cf", "message": "[types removal] Specifying types in docume
nt index requests is deprecated, use the typeless endpoints instead (/{index}/_doc/{id}, /{index}/_doc, or /{index}/_create/{id}).", "cluster.uuid": "7irLdTC_S7eXwYcVFolppQ", "node.id": "M_fMZ3KxQnWP3AiguV1_jA" }
This is how the index has been created:
es = Elasticsearch(hosts="localhost:9200", serializer=BSONEncoder())
es.indices.create(index="documents", body=mapping)
Where mapping
contains a dictionary defining the mappings as show in the output above:
mappings = {
"mappings": {
"properties": {
[...],
"lda": {
"type": "dense_vector",
"dims": 150
},
}
}
}
Update:
I suspect that the mappings
are indeed the problem. Indexing a document without the lda
field also fails:
RequestError: RequestError(400, 'illegal_argument_exception', 'Rejecting mapping update to [documents] as the final mapping would have mo
So, I edited the mappings to include the index name:
"mappings": {
"document": {
[...]
"lda": {
"type":"dense_vector",
"dims":150
}
}
}
}
This results in an empty mapping though, with the types being inferred while index documents.
--- End update ---
I am not sure where to proceed debugging. The deprecation warning when creating the index seems potentially relevant, but I'm not sure how to resolve it. Furthermore, the error message does not really seem to indicate that that was the problem.
The documentation for the dense_vector
type does not reveal many details. The examples shown there do work, however (using cURL requests).
Is there a functional difference between how an index is created through Python from the cURL approach?
How can I find out what the real error message is; the dimensionality is clearly defined through the dims
property.