RCA needs more information, using Hibernate Search

Question

One of our customers, for whom we did custom development is facing this weird issue. though we are still only in pre-uat, we see this to be not conforming to any known Lucene behavior.

We are using Hibernate Search 5.5.2, Apache Lucene 5.3.x; we are using filesystem for the index. We are running this inside a Weblogic 12c container, with oracle 12c as the database.

We have two different virtual machines that host two different Weblogic 12c instances (thereby the application) that point to the same database (thereby the same data); at the start of the application, we index the data on the filesystem on each of the nodes. But for the same query, it yields different results on each of the nodes!

Has anybody faced a similar issue? Is the indexing mechanism in any way tied to the hardware or a specific machine? I just cannot fathom the reason for this behavior.

Also, the next question is that if there are non-replicated (no form of replication) clustered nodes (weblogic 12c) - is it OK to index on each of the nodes separately,the same data? Or is it necessary that we use master-slave replication? I do not need the answer from a maintainability point of view, but rather from the view of correctness of results?

See the original question on the official hibernate search forum at: https://forum.hibernate.org/viewtopic.php?f=9&t=1043314

score 1 · Answer 1 · answered May 25 '16 at 16:59

1

It appears as if you just store an index on each node containing just the data modified through this node. So query results will vary depending on the node executing the query.

You should look into options for clustered set-ups, specifically the master/slave set-up using JMS or the Infinispan-based backend. Both will make sure that there is a single index containing all the data.

answered May 25 '16 at 16:59

Gunnar

18,095
1
53
73

No... We are not 'modifying' any data. The data is read-only access (for viewing purposes). Does it imply that each node indexes the same data from same database differently? – Sumith Kumar Puri May 26 '16 at 02:07
How are you triggering indexing then? Are you using the mass indexer? You should only start it on one node of the cluster. – Gunnar May 26 '16 at 13:26
think of it as a non-replicated cluster (no data replication) OR view the issue right now simply as 'two independent nodes reading from the same database (oracle rac) node - exactly the same data - indexing the data and then using the exactly same query term to search for data' : results returned are different! is this the expected behavior? – Sumith Kumar Puri May 30 '16 at 10:16
In this case, if there are no independent modifications applied by the two nodes, and both rebuild the index from the same source, then yes the indexes should be identical and so should be the results. Make sure the same exact version of your application and dependencies are deployed on both machines? And the same connection properties, environment settings have been set? – Sanne Jun 01 '16 at 08:40
This behavior sounds odd, indeed the index should look the same. What are the differences exactly? Different field values, additional rows? Or do only look query results different? That said, I'd recommend to maintain the index only from one node in this special case (assuming you get to the point where the index looks the same, no matter which node created it). – Gunnar Jun 02 '16 at 07:45
thanks gunnar, sanne. i used luke to analyze this issue. for the same search term, on the index created from exactly the same data from the same database (oracle rac); search results are different. so, it has nothing to do with our application/code. just with lucene/hibernate search entirely. the search results differ in additional or different matches (missing from first index result). i have right now suggested the use of master-slave replication to my customer; but seems like something is missing in the analysis of this issue... somewhere in the understanding of hibernate search/lucene? – Sumith Kumar Puri Jun 07 '16 at 07:01
What was the outcome of your Luke analyzis? Do the indexes differently? – Gunnar Jun 08 '16 at 07:22

RCA needs more information, using Hibernate Search

1 Answers1