0

Using sparse vectors in elastic search has two dimensional limits. On the one hand, vectors should not have more than 1024 elements.

This can be solved, as seen in this question.

The second limit is not the number of elements in one sparse vector, but the dimension of the elements. For example, if we have 20 dimensions, we could have this two vectors:

v1 = {"1": 0.01, "7": 0.2, "0": 0.4}
v2 = {"19": 0.02, "11": 0.7}

with only 3 and 2 elements each. Note that keys range from 0 to 19, as strings.

These dictionary keys (sparse vectors are given as dictionaries to json) are integers encoded as strings, and cannot go beyond the funny number 65535.

I am guessing this might have something to do with the default limit for file descriptors, which is 65535 as well, which I think is too suspicious to be unrelated.

Are these problems actually related? And is it possible to bypass the limitation for sparse vectors? In my case the dimension of the sparse vectors is given from a vocabulary, so reducing it will harm results (I am not so worried about query performance, though.)

Pablo
  • 1,373
  • 16
  • 36

1 Answers1

1

They actually increased the number of dimensions from 500 to 1024 in the past in order to satisfy requirements for larger models. The only way to increase the limit higher is by editing this configuration and installing from source.

Not sure about the dictionary key issue.

However, in my experience the dense vector search in Elasticsearch is very slow, so I created an out-of-the-box platform for improving search relevance with SOTA models called NBoost.

Hope this helps!

Cole Thienes
  • 11
  • 1
  • 1