0

I am using Java's file locking API on a Linux server machine and try to lock a file on a remote Linux NFS system. There are some issues that have popped and logs show that 2 different cluster nodes running the same Java webserver app are able to both acquire a lock on the same NFS file.

Reading online about how Java handles locks and Linux file locking (we usually deploy on Windows servers so there is very little Linux experience), what we do should work. We are essentially issuing advisory locks but since both cluster nodes run the same code they are cooperating processes and they check for lock acquisition before starting to do any read-write ops. However this does not seem to be the case and both systems are able to successfully acquire a lock on the file, concurrently.

Are those known issues? Many comments/articles online declare NFS file locking as unstable and should be avoided. Why is that? How would network connectivity issues (e.g. sudden communication drops) influence this behavior? Linux kernel version should be quite recent.

PentaKon
  • 4,139
  • 5
  • 43
  • 80
  • 1
    What version of the NFS protocol are you using? NFS 2 and 3 don't support locking. What NFS mount options are you using? (On the affected NFS client nodes.) – Stephen C Aug 08 '22 at 15:13
  • 2
    This won't be Java's "fault". Java will simply be using the locking syscalls provided by the OS. The problems will be in the client operating system, the way the remote file system(s) are mounted, and/or the way that the NFS server(s) are set up. – Stephen C Aug 08 '22 at 15:19
  • @StephenC Which options should we look for and which values should be configured in order to achieve the expected functionality? In the meantime will ask our customer to provide us with NFS version and NFS mount options they use. – PentaKon Aug 08 '22 at 15:36
  • 1
    They should require NFS 4 and (obviously) not use the `nolock` option. But I don't know if that is sufficient. Since it is not a programming question, this question should be asked on a different SE site. Maybe "ServerFault" or "UNIX & Linux" ... where people whose skills are in this areas. And search for answers *there* too. – Stephen C Aug 08 '22 at 23:17
  • @StephenC So it's supposedly an NFSv4 with mount options `bg,hard,intr,nodev,rsize=32768,wsize=32768,rw,tcp,timeo=600` – PentaKon Aug 11 '22 at 11:39
  • See the last 2 sentences of my previous comment. This is not a programming question. StackOverflow is not a good place to ask gnarly NFS questions. – Stephen C Aug 11 '22 at 11:46
  • Yeah, in this thread I'm not looking for Linux advice, I'm mostly looking for knowledge regarding the way Java interprets file locking in Linux. – PentaKon Aug 11 '22 at 14:03
  • @StephenC so after some more testing, when calling `RandomAccessFile.getChannel().tryLock()` from a java `main` method it works fine over nfs4 but when the same code runs within Tomcat (8.5.68) multi-locking occurs. – PentaKon Aug 31 '22 at 11:51
  • 1
    I think your comment contains the missing clue we needed. See my answer. – Stephen C Sep 01 '22 at 00:06

2 Answers2

2

@StephenC so after some more testing, when calling RandomAccessFile.getChannel().tryLock() from a java main method it works fine over nfs4 but when the same code runs within Tomcat (8.5.68) multi-locking occurs.

OK. So I think I understand the root of your problem now. From what you have said, it sounds to me like you have are trying to use FileLock to stop one thread of your Tomcat JVM from locking a section of a file while another Tomcat thread has it locked.

That's not going to work.

The lock that you are using is a FileLock. A key paragraph of the javadoc states this:

File locks are held on behalf of the entire Java virtual machine. They are not suitable for controlling access to a file by multiple threads within the same virtual machine.

In this case, "not suitable" means "doesn't work".

If you drill down to the Linux manual page for flock (2) (which is used by Java to implement these locks), you will see that the semantics are defined in terms of multiple processes, not multiple threads. For example:

LOCK_EX Place an exclusive lock. Only one process may hold a shared lock for a given file at a given time.

and

A call to flock() may block if an incompatible lock is held by another process.


So, in summary, it is still not Java's fault. You are trying to use FileLock in a way that Java doesn't support ... and could not support, given how Linux (and indeed POSIX) flock is specified.

(IMO, all of the stuff about NFS is a red herring. The above problem is not caused by NFS. The reason that it shows up on an NFS file system, is that NFS operations take longer and therefore the time window for overlapping operations on the same file is much larger. And if your customer's use-case is hammering their NFS ...)

(But if I am wrong and NFS is implicated, then your "main vs Tomcat" observation is inexplicable. The JVM will not be doing file locking differently in those two cases: it will be using the same OpenJDK code in both cases. Furthermore, the JVM won't even be aware that it is talking to an NFS file system. You can take a look at the OpenJDK codebase if you don't believe me. It's not that hard ...)

See also:

and so on.

Stephen C
  • 698,415
  • 94
  • 811
  • 1,216
  • Very good answer but not for my case. You missed the part (which I should have bolded maybe) about me having 2 cluster nodes, i.e. to different servers that are however running the same application i.e. 2 different JVMs. – PentaKon Sep 01 '22 at 07:59
  • I have been tracking this bug down however and I think the issue is on our code. Locking over linux nfs4 works fine when running a small test Java application but the same code fails when it gets executed within our app. This means that our app does something that somehow unlocks the file... – PentaKon Sep 01 '22 at 08:00
0

I found the root cause of this issue. It seems that when two different threads of the same JVM create a RandomAccessFile object on the same file, calling RandomAccessFile.close from one thread, releases the lock the other thread has.

The documentation of RandomAccessFile.close says

Closes this random access file stream and releases any system resources associated with the stream.

I'm not sure if this means that the system resources are released on the JVM level. I would expect that each RandomAccessFile object would get its own stream and release only that one but it seems that is not the case (or at least the lock on that file gets released. Note that this behavior has not been observed on Windows systems.

PentaKon
  • 4,139
  • 5
  • 43
  • 80