I would like to know if it is possible to store Lucene Search Index in Amazon elastic file system (EFS) as an alternative to a system directory provider(such as that used in hibernate search in java i.e hibernate.search.default.indexBase. And if so what is the best way to implement this. Thanks in advance?
-
Have you tried it? Files on EFS behave like... files. It's a filesystem. – Michael - sqlbot Jan 18 '18 at 01:19
-
I am building a distributed system that i have hosted on aws elastic bean stack. Right now i am implementing hibernate search that is working fine locally.I was wondering how to store the search index in a way that would work with distributed environment in the cloud, apart from using infinispan of course. I am impressed by the fact that Amazon EFS data is distributed across multiple Availability Zones, providing a high level of durability and availability. But i want to be sure of what i am doing. I am worried about speed and the related protocals. – geobudex Jan 18 '18 at 04:59
-
1For a distributed system, consider a few things like - where is the datasource or trigger/event that updates the index? One strategy to then have 1-to-1 associated index stores (index search instances) within multiple regions if there are datasources (sync'd) in each region. DNS can then take care of the rest. Alternatively, if there is one global datasource, then concurrency on index updates MUST be considered and in that case, look at distributed setup for ElasticSearch. Can be done with Lucene but is very complicated for which ES has already solved. – Darrell Teague Mar 16 '20 at 17:45
2 Answers
So far there has been with issues on search and index performance degradation with Apache Lucene running on NFS. As EFS supports NFS. Storing Lucene search index on AWS NFS would most likely cause the Linux to lock up and give a load of error messages by default. By experience, In the long run EFS is based on NFS and it isn’t good(at least so far) to use it with lucene.

- 536
- 1
- 8
- 24
-
3"So far there has been with issues on search and index performance degradation with ... NFS" -- I would appreciate an edit to this answer so that one does not have to guess what is being said. I presume it should say, "So far there have been issues with search and index performance degradation with ... NFS." [And if that is true, what are the JIRA issues? The Lucene Documentation does not mention that NFS is a problem. In fact, IndexWriter seems to have a feature that supports NFS, so I am puzzled as to why this claim is being made. Also, is your experience really with NFS or EFS?] – AWhitford Aug 19 '18 at 06:31
Have built a few large Hibernate / Lucene indexes - the primary issue is really file-locking. Performance is one thing wherein a lot of corporate NFS farms are general-purpose storage, not tuned as a backing store for a full-text search (implied "sub-second" or close thereof expected response time).
What happens pragmatically is that, for update-capable indexes as a use-case, is that to update an index, the file in question (and there are many files making up the indices) has to be exclusively locked system wide.
NFS has had a long history of locking issues (having programmed for them in "C" back in the day) with the NFS "lock daemon" and associated models wherein the process gets hung, stalls, has to be restarted, etc.

- 4,132
- 1
- 26
- 38
-
While this answers the question as to "are there issues (with NFS in storing indexes)?" ... what may be more useful would be a replacement solution: I just delivered a full-text search service for a lab that stores files in AWS S3. Ended up building the index on an RDS database using an old Compass project that (for Java at least) makes it a backing store for the indexes. Updatable. Locking problems resolved as the underlying transaction manager is an RI database (Aurora - based on MySQL) wherein the records are the index BLOBs. – Darrell Teague Apr 15 '20 at 22:21