I have an existing Lucene store with many millions of documents, each one representing metadata for an entity. I have a few Id fields (Id1, Id2 .. Id5) and each document can have zero or many values for this field. The index is only ever queried by one of these Ids at a time. I've indexed these fields independently and it's is all working great. I initially chose to use Lucene as it was by far the fastest way to query such a vast number of small documents and I am happy with my decision.
However now I must store another type of document which also represent a different kind of metadata for entities and have values for (Id1, Id2 .. Id5), and which also will be queried by one of those Ids separately. The existing metadata and this new set of data will be stored and queried for independently from each other.
How do I query Lucene by an Id but for only one type of document. I can think of a few options, but I'd like to know what those in the know recommend from experience in order to keep Lucene manageable and fast.
- Use separate Lucene indexes. This would avoid the problem since the document types are orthogonal. There's also the benefit being able to read and write from the indexes separately.
- Rename the fields Id1..Idn for the new documents to XId1...XIdn. In this way, documents of one type would not have the same field names as documents of another type. This seems like more of a workaround to avoid the problem than an actual solution.
- Add a numeric field "Type" and change indexies to (Type, Idx). This method seems wasteful as each index would have to also contain the type.
I am able to break backwards compatibility with my existing setup. It would be great if the solution can be reused if I come to add another document type.