1

I have an ElasticSearch (v7.5.1) index with a dense_vector field called lda, with 150 dimensions. The mapping, as shown on http://localhost:9200/documents/_mapping, looks like this:

"documents": {
  "mappings": {
    [...]
    "lda": {
      "type":"dense_vector",
      "dims":150
    }
  }
}

When I try to index a document through the Elasticsearch Client for Python (v7.1.0), ES throws this error message:

{"type": "server", "timestamp": "2020-01-03T08:40:04,962Z", "level": "DEBUG", "component": "o.e.a.b.TransportShardBulkAction", "cluster.name": "docker-cluster", "node.name": "8d468383f2cf", "message": "[documents][0] failed to execute bulk item
 (create) index {[documents][document][S_uPam8BUsDzizMKxpRR], source[{\"id\":42129,[...],\
"lda\":[0.031139032915234566,0.02878846414387226,0.026767859235405922,0.025012295693159103,0.02347283624112606,0.022111890837550163,0.02090011164546013,0.019814245402812958,0.0188356414437294,0.01794915273785591,0.01714235544204712,0.01640496961772442,0.015728404745459557,0.
015105433762073517,0.014529934152960777,0.013996675610542297,0.013501172885298729,0.013039554469287395,0.012608458288013935,0.012204954400658607,0.011826476082205772,0.011470765806734562,0.011135827749967575,0.010819895192980766,0.01052139326930046,0.010238921269774437,0.0,0
.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,
0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0]}]}", "cluster.uuid": "7irLdTC_S7eXwYcVFolppQ", "node.id":
"M_fMZ3KxQnWP3AiguV1_jA" , 
"stacktrace": ["org.elasticsearch.index.mapper.MapperParsingException: The [dims] property must be specified for field [lda].",                                                                                                            [22/1876]
"at org.elasticsearch.xpack.vectors.mapper.DenseVectorFieldMapper$TypeParser.parse(DenseVectorFieldMapper.java:104) ~[?:?]",                                                                                                                        
"at org.elasticsearch.index.mapper.DocumentParser.createBuilderFromFieldType(DocumentParser.java:680) ~[elasticsearch-7.5.1.jar:7.5.1]",                                                                                                            
"at org.elasticsearch.index.mapper.DocumentParser.parseDynamicValue(DocumentParser.java:826) ~[elasticsearch-7.5.1.jar:7.5.1]",                                                                                                                     
"at org.elasticsearch.index.mapper.DocumentParser.parseValue(DocumentParser.java:619) ~[elasticsearch-7.5.1.jar:7.5.1]",                                                                                                                            
"at org.elasticsearch.index.mapper.DocumentParser.parseNonDynamicArray(DocumentParser.java:601) ~[elasticsearch-7.5.1.jar:7.5.1]",                                                                                                                  
"at org.elasticsearch.index.mapper.DocumentParser.parseArray(DocumentParser.java:560) ~[elasticsearch-7.5.1.jar:7.5.1]",                                                                                                                            
"at org.elasticsearch.index.mapper.DocumentParser.innerParseObject(DocumentParser.java:420) ~[elasticsearch-7.5.1.jar:7.5.1]",                                                                                                                      
"at org.elasticsearch.index.mapper.DocumentParser.parseObjectOrNested(DocumentParser.java:395) ~[elasticsearch-7.5.1.jar:7.5.1]",                                                                                                                   
"at org.elasticsearch.index.mapper.DocumentParser.internalParseDocument(DocumentParser.java:112) ~[elasticsearch-7.5.1.jar:7.5.1]",                                                                                                                 
"at org.elasticsearch.index.mapper.DocumentParser.parseDocument(DocumentParser.java:71) ~[elasticsearch-7.5.1.jar:7.5.1]",                                                                                                                          
"at org.elasticsearch.index.mapper.DocumentMapper.parse(DocumentMapper.java:267) ~[elasticsearch-7.5.1.jar:7.5.1]",                                                                                                                                 
"at org.elasticsearch.index.shard.IndexShard.prepareIndex(IndexShard.java:791) ~[elasticsearch-7.5.1.jar:7.5.1]",
"at org.elasticsearch.index.shard.IndexShard.applyIndexOperation(IndexShard.java:768) ~[elasticsearch-7.5.1.jar:7.5.1]",
"at org.elasticsearch.index.shard.IndexShard.applyIndexOperationOnPrimary(IndexShard.java:740) ~[elasticsearch-7.5.1.jar:7.5.1]",
[...]

This is how documents are added to the index programmatically:

es = Elasticsearch(hosts="localhost:9200")
es.index(index=self.index, doc_type=doc_type, body=document_data)

Where document_data is a dictionary, holding the data as shown in the error log above, including this:

{
  [...]
  "lda": [0.031139032915234566, ...]
}

The index was created immediately before, so no documents in there yet. I notice, when I created the index, there was this output:

{"type": "server", "timestamp": "2020-01-03T08:40:03,280Z", "level": "INFO", "component": "o.e.c.m.MetaDataCreateIndexService", "cluster.name": "docker-cluster", "node.name": "8d468383f2cf", "message": "[documents] creating index, cause [api], 
templates [], shards [1]/[1], mappings [_doc]", "cluster.uuid": "7irLdTC_S7eXwYcVFolppQ", "node.id": "M_fMZ3KxQnWP3AiguV1_jA"  }                                                                                                                                                   
{"type": "deprecation", "timestamp": "2020-01-03T08:40:04,940Z", "level": "WARN", "component": "o.e.d.r.a.d.RestDeleteAction", "cluster.name": "docker-cluster", "node.name": "8d468383f2cf", "message": "[types removal] Specifying types in docume
nt index requests is deprecated, use the typeless endpoints instead (/{index}/_doc/{id}, /{index}/_doc, or /{index}/_create/{id}).", "cluster.uuid": "7irLdTC_S7eXwYcVFolppQ", "node.id": "M_fMZ3KxQnWP3AiguV1_jA"  }

This is how the index has been created:

    es = Elasticsearch(hosts="localhost:9200", serializer=BSONEncoder())
    es.indices.create(index="documents", body=mapping)

Where mapping contains a dictionary defining the mappings as show in the output above:

mappings = {
  "mappings": {
    "properties": {
      [...],
      "lda": {
          "type": "dense_vector",
          "dims": 150
      },
    }
  }
}

Update: I suspect that the mappings are indeed the problem. Indexing a document without the lda field also fails:

RequestError: RequestError(400, 'illegal_argument_exception', 'Rejecting mapping update to [documents] as the final mapping would have mo

So, I edited the mappings to include the index name:

  "mappings": {
    "document": {    
      [...]
      "lda": {
        "type":"dense_vector",
        "dims":150
      }
    }
  }
} 

This results in an empty mapping though, with the types being inferred while index documents.

--- End update ---

I am not sure where to proceed debugging. The deprecation warning when creating the index seems potentially relevant, but I'm not sure how to resolve it. Furthermore, the error message does not really seem to indicate that that was the problem.

The documentation for the dense_vector type does not reveal many details. The examples shown there do work, however (using cURL requests).

Is there a functional difference between how an index is created through Python from the cURL approach?

How can I find out what the real error message is; the dimensionality is clearly defined through the dims property.

Carsten
  • 1,912
  • 1
  • 28
  • 55
  • the following error means that elastic doesn't read your mapping : The [dims] property must be specified for field [lda]. How do you have sent your mapping to ES? – Lupanoide Jan 03 '20 at 09:07
  • I've added some details about the index creation part. The mappings are only sent during the creation process. As they are shown on the index page (http://localhost:9200/documents/_mapping), I understand they have actually been processed as intended. – Carsten Jan 03 '20 at 09:13

1 Answers1

0

You are using ES 7.x that doesn't support anymore doc_type -doc here - it is written also in the message returned from index creation:

[types removal] Specifying types in docume
nt index requests is deprecated, use the typeless endpoints

But you tried to set a doc_type in your mapping:

es.index(index=self.index, doc_type=doc_type, body=document_data)

From the version 7 you could set only _doc as doc_type, but you tried to set your own - document . This produces an error, and your mapping is rejected by elastic:

RequestError: RequestError(400, 'illegal_argument_exception', 'Rejecting mapping update to [documents] as the final mapping would have more ...... (my add than one doc_type _doc, document)

to resolve your problem you should simply try to remove the doc_type in the mapping -your doc_type var or mapping var during documents index creation

Lupanoide
  • 3,132
  • 20
  • 36
  • The `doc_type` was the reason indeed. This was there due to a migration from ES 6.x. – Carsten Jan 03 '20 at 10:12
  • Had same issue but realized was using ES python 6.x ...resolved by updating ES python. As noted above ES 7 does not have doc_type. If indexing a dense_vector and using python remember to convert the vector to a list (e.g. ndarray.tolist( ) ) – user8291021 Apr 16 '20 at 16:13