Azure App Service - Running Solr on Jetty - LockObtainFailedException after Azure maintenance

Question

I'm running a single (not scaled) solr instance on a Azure App Service. The App Service runs Java 8 and a Jetty 9.3 container.

Everything works really well, but when Azure decides to swap to another VM sometimes the JVM doesn't seem to shutdown gracefully and we encounter issues.

One of the reasons for Azure to decide to swap to another VM is infrastructure maintenance. For example Windows Updates are installed and your app is moved to another machine.

To prevent downtime Azure spins up the new app and when it's ready it will swap over to the new app. Seems fine, but this does not seem to work well with solr's locking mechanism.

We are using the default native lockType, which should be fine since we're only running a single instance. Solr should remove the write.lock file during shutdown, but this does not seem to happen all of the time.

The Azure Diagnostics tools clearly show this event happening:

And the memory usage shows both apps:

During the start of the second instance solr tries to lock the index, but this is not possible because the first one is still using it (it also has the write.lock file). Sometimes the first one doesn't remove the write.lock file and this is were the problems start. The second solr instance will never work correctly without manual intervention (manually deleting the write.lock file).

The solr logs:

Caused by: org.apache.solr.common.SolrException: Index dir 'D:\home\site\wwwroot\server\solr\****\data\index/' of core '*****' is already locked. The most likely cause is another Solr server (or another solr core in this server) also configured to use this directory; other possible causes may be specific to lockType: native

and

org.apache.lucene.store.LockObtainFailedException

What can be done about this? I was thinking of changing the lockType to a memory-based lock, but I'm not sure if that would work because both instances are alive at the same time during a short period of time.

depends what you want to have being done. Obviously here the problem is that lock isn't released properly, so you could go without lock at all or write custom LockFactory that for sure will be released during JVM crash — Mysterion, Dec 21 '18 at 17:03
@Mysterion How would I configure my core/solr to not use a lock at all? — Rob, Dec 27 '18 at 08:54

score 3 · Answer 1 · answered May 10 '20 at 02:48

3

You could try and set WEBSITE_DISABLE_OVERLAPPED_RECYCLING=1

Overlapped recycling makes it so that before the current instance on an app is shut down, a new instance starts. It can in some cases cause file locking issues, in which case you can try turning it off:

Reference

answered May 10 '20 at 02:48

Mark Gibbons

471
3
19

Thanks, didn't know about this one. This seems like a better solution compared to adjusting the application's behavior. – Rob May 13 '20 at 11:28
Use the `Recreate` deployment strategy if you have the same issue on Kubernetes – Mark Lowe Aug 23 '23 at 13:56

score 1 · Answer 2 · answered Dec 28 '18 at 10:08

If you would like to run Solr without any locks at all you could do these by specifying in your solrconfig.xml instead of usual <lockType>native</lockType> you could use <lockType>none</lockType>.

Obviously, you need to be careful with this mechanism, since different Solr instances could try to change index at the same time which could lead to potential corruptions.

All available lock types are listed there

Azure App Service - Running Solr on Jetty - LockObtainFailedException after Azure maintenance

2 Answers2