Duplicating all data in Lucene index

Question

We are creating lucene indexes from data being stored in event store as a stream of events. Those indexes are used to provide efficient paging/sorting/search capabilities with our data.

It happens that we have to duplicate all data in indexes in order to fulfill our requirements. What is conceptually the best way to query data in this situation?

I see 2 options:

query all data for building view models directly from index
query only list of ids from index and use those ids to query data from event store

We are concerned about scalability and fault tolerance as well, so I have to think about those also. Any suggestions?

score 1 · Answer 1 · answered Mar 19 '19 at 14:35

there is 3rd way: you can store two kind of data in index: fields to search & + 1 field = "complete serialized object".Then it takes much less time to get data from index (when it is json - it could be directly used on client side). Disadvantages: indexing is taking more time, index size - 2 times higher (there is possibility to 'zip' data before store in index). For secure reason some data shouln't send directly back to client etc.

score 0 · Accepted Answer · answered Oct 09 '14 at 15:19

I guess option #1 is better. Store data in index, only those pieces that You need to build model from in paged/filtered table. And fetch them from there. It's lightning fast.

Hibernate Search uses approach similar to option #2. It stores id and Java class, looks it up in index then fetches from DB. Although it can be circumvented when too costly. I had a case recently that I used it because default behaviour killed my DB. Works like a charm.

I never (across 4 projects) experienced index corruption but definately reindexing should be possible in the application.

Do You use event snapshots? They could be indexed as well.

Duplicating all data in Lucene index

2 Answers2