2

we are using milvus with the default configurations for CPU deployment, wit every new record ingested into milvus we were rebuilding the index inside milvus for this collection however we have seen an increase in time needed to build the index ( when records in single workspace reached about 150K ) that went around half a minute

so we removed building the index manually to letting milvus rebuild based on index_file_size but there was something off right there as searching in another workspace which was indexed properly before disabling manuall indexing became much slower than before

so my questions are ?

  • can index in a workspace be affected by non indexed workpsace ?
  • why inserting and building the index take this very long time ?
  • how to choose the perfect index_file_size ? do you have any suggestions in general working with cpu milvus in production ?

1 Answers1

3

I am assuming you are using Milvus 1.x I am not familiar with the expression 'workspace', I am assuming that you are refering to collection or partition

For your first question: can index in a workspace be affected by non indexed workpsace ?

I am assuming that you are asking: can a collection's ongoing index task be affected by a collection that is not being indexed.

A: Sure it can, Milvus 1.x is a standalone solution and different tasks share the same resource. Although the second collection is not being indexed, search task can still take up a lot of resource as it is a very CPU intensive task.

why inserting and building the index take this very long time ?

Insert should not take up a long time, please check whether the time is spent on network IO. Building index is a very CPU intensive task and it can take up relatively longer time depends on the size of data, type of index and the machine you are using to host Milvus. If the time is too long, you can consider using GPU or switching to another kind of index.

how to choose the perfect index_file_size ? do you have any suggestions in general working with cpu milvus in production ?

If there is no data being added continuously, large index_file_size can have great benefit on search performance. However, if there is newly added data, you would want to have a smaller index_file_size as the segment that is being inserted is not indexed, which can hurt the overall search performance.

For index_file_size's affect on index building performance, we assume the number of vectors N, the complexity of building an ivf index is O(θ * N), θ being a constant number. The overall cost should not be affected by index_file_size.

Stan Shen
  • 66
  • 3