3

I run a Proxmox cluster, and on this cluster, I have a few VMs on a private network, with a (Proxmox-managed) CEPH storage backend for the VM disks.

One (KVM) VM running "Ubuntu 16.04 server minimal vm" is configured with a second "hard disk", set up as a one disk ZFS pool "storage", using

zpool create storage /dev/sdb1

which gets automounted to /storage. This VM also runs the nfs-kernel-server.

This directory is then exported through nfs with the following line in /etc/exports:

/storage        10.10.0.0/16(rw,sync)

I mount this export from two other machines (one VM running Ubuntu 14.04, one physical machine running Ubuntu 16.04 server) through

mount -t nfs4 10.10.3.1:/storage /mnt

Since this is my playground for testing a storage setup for a planned two web servers hosting an old perl app writing to Berkeley DB files, I decided to test concurrent writes in a simple way to test my shared storage backend, with a simple php script:

<?php
    $line = str_repeat($argv[1], 30) . "\n";

    for ($i = 1; $i <= 10000; $i++)
    {
        $of = fopen("test.txt", "a") or DIE("can't open output file\n");
            fwrite($of, sprintf("%04d-", $i)  . $line);
        fclose($of);
    }

?>

I go to the shared storage directory (this is where the php script is also located), and run it using

php test.php 1

from the first remote machine, and with

php test.php 2

from the second machine.

My issue is that some writes don't seem to make it to the destination file, i.e. I get output like this:

9286-222222222222222222222222222222
9287-222222222222222222222222222222
9288-222222222222222222222222222222
9289-222222222222222222222222222222
7473-111111111111111111111111111111
7474-111111111111111111111111111111
7475-111111111111111111111111111111
7476-111111111111111111111111111111
7477-111111111111111111111111111111
7478-111111111111111111111111111111
7479-111111111111111111111111111111
9297-222222222222222222222222222222
9298-222222222222222222222222222222
7481-111111111111111111111111111111
9300-222222222222222222222222222222
7482-111111111111111111111111111111
9302-222222222222222222222222222222
7484-111111111111111111111111111111

and verifying that the line doesn't get cached and written at a different position in the file:

nas:/storage# grep "9290-" test.txt
9290-111111111111111111111111111111
nas:/storage# 

i.e. it's missing (among others) the

9290-222222222222222222222222222222

line. At this point, I'm hoping that I'm simply missing some configuration parameters or a step or two during setup that would fix this problem.

Edit: I only just noticed the writes seem to block each other out, i.e. the gaps between the line numbers always correspond with the number of interleaving writes from the other remote "writer". I'm still no closer to an explanation of why this happens nor how to resolve it, though.

Also, I had "Discard" and "IO thread" active on proxmox for the vm hard disk, and disabled these two options, to no effect (didn't think it would, but checked nevertheless). The behavior is the same.

  • 2
    Disregarding the overly complecated frankenstack of a setup you have there - are you actually trying to get two servers to write to the SAME file? How are you expecting that this is going to work? You need to use a transactional database, not files. – pauska Mar 06 '17 at 13:06
  • Naively, I would have expected NFS locking to be sufficient, i.e. that while Server A has a write lock (file opened for writing), Server B can't establish a write lock, delaying the file open until Server A closes the file again. What you're saying is that this is, indeed, too naive, and that I need to either implement application level locking, or move to, for example, mysql/innodb? – Eddy Buhler Mar 06 '17 at 13:20
  • Your application needs some kind of data logic - how long should a write lock last? PHP doesn't know what a completed write looks like, it could be one character or hundred lines of text. I'm sure there are flat-file database extensions for PHP that you can use (I'm no programmer, this is best suited for stack overflow), but yes - you can't trust the file system to do the job. – pauska Mar 06 '17 at 13:23
  • 2
    First hit on google for "simultaneous write to file over NFS" says this is a bad idea. https://utcc.utoronto.ca/~cks/space/blog/unix/NFSWritePlusReadProblem – mfinni Mar 06 '17 at 13:52

1 Answers1

3

Okay, apparently Berkeley DB offers locking mechanisms for concurrent access, so my "simple test scenario" is inadequate in that locking is required to happen on the application level; my test script doesn't do anything of the kind, so the test doesn't match the use case.

Consequently, I'm considering this question answered. Thanks for the replies!