Basically we want to be able to search in various subsets of a large document repository. We are thinking about using a multivalued field to store for each document which subsets it's currently in, and filter on this field when searching. The problem is that the subsets are constantly changing, so we have to frequently add new subsets and remove old subsets from this field.
I have read that when updating a field in a Solr document, I have to update the whole document, and the document is updated by deleting the old copy and adding a new copy. So frequent updates will cause a lot of deleted copies and bloat the internal lookup table, and performance degrades.
My question is how serious is this degradation? And is there any better way to approach this problem? This should be a common problem after all, what immediately comes to mind are searching for articles with a specific tag and searching in a user's favorite articles (although our own use case is more complex).
I have looked at the ExternalFileField a bit but it seems that it doesn't support multivalued fields (I hope I'm wrong), and there are too many different combinations of subsets to use one integer to represent a combination (to transform the multivalued field into a single-valued field).