1

We have 2 Drupal servers that read/write to their own copy of the same folder (the sites/default/files folder for those of you who know a bit about Drupal). Those 2 folders should be in sync. I've been looking into some options and here is what I found out:

OPTION 1: Rsync both ways : not an option
You would need to run rsync both ways because both folders get modified. As long as files are just modified everything works fine because you can use the -u flag which checks update time and only modifies if source is more recent than destination. However because rsync doesn't keep a history of files that are being removed and when, rsync wouldn't know what to do with files deleted on 1 side as to whether they should be kept on the other side because updated more recently or thrown away as well.

OPTION 2: Network share: OK, but I/O wait performance issue
One option would be to setup a network share, removing the needs for synching. The downside is I/O wait as both servers would read/write on the same disk.

OPTION 3: 3rd server with master copy: OK, but potential performance/race condition issues Another option would be to have a 3rd sever keeping a master copy of the folder. Whenever a change is made on one Drupal server, the Drupal server folder would be rsync'ed to the master copy, circumventing the issue raised in option 1. For this to work however you would need changes to be synched to the master copy in order of occurrence on the Drupal servers, raising the following problems:
-P1: if you sync to master for every change made and changes are frequent you servers could get quiet busy with the synching process
-P2: even if you start the synching jobs in order, due to various elements (execution speed of process, network delays...) you have no guarantee the files will end up being synched in order on the master copy.

Q1: How to you address problems P1 and P2?
Q2: Are there any other approaches to keeping 2 remote folders in sync?

Additional info:

server OS:                   Ubuntu server 10.04 LTS
Drupal v:                    Drupal 6.X
Size of sites/default/files: 4.5G

Update 1: testing of Unison

I tested Unison and it doesn't work as I expected regarding deleted files:

[1] Setting up the directories

FOLDER1     FOLDER2
file1 (new)     (empty)

[2] Running Unison (unison FOLDER1 FOLDER2)

FOLDER1        FOLDER2            
file     ---->            file1

=> file1 gets copied from FOLDER1 to FOLDER2

[3] Updating the directories

FOLDER1     FOLDER2
file1 (removed) file1 (modified)

[4] Running Unison again (unison FOLDER1 FOLDER2)

FOLDER1        FOLDER2            
deleted  <-?-> changed    file1  [] 
No default command [type '?' for help]

At this point Unison doesn't know whether it should delete file1 from FOLDER2 or copy it to FOLDER1. I would expect Unison to do the latter as:
-at [2] we know the last modify/access times of file1 in both folders and these get copied in the Unison archives.
-at [4] we see file1 is missing from FOLDER1 so time taken into account for the removal should be last available time in archive (i.e. time obtained at [2]).
-at [4] we also see that last modify/access time of file1 in FOLDER2 is greater than [2] for FOLDER1, so file1 should be copied from FOLDER2 to FOLDER1.

I've been trying different switches such as -auto (automatically accept default actions) and -batch (batch mode: ask no questions at all), but still, Unison can't make that decision by itself.

Q: Is there a way to get Unison or another tool to perform according to the behaviour I describe?

Max
  • 3,523
  • 16
  • 53
  • 71

5 Answers5

2

Is it necessary for you to use 2 copies of Drupal? Drupal makes a lot of queries per page request, having multiple Drupal front-ends sharing a remote database backend can be a pretty big performance penalty.

Have you consider using multiple caching frontends and a single drupal + database backend? Pressflow is an enhanced version of Drupal that has built-in integration with memcached and Varish (caching frontend).

http://pressflow.org/

goofrider
  • 91
  • 4
1

I think you are over-complicating the problem. All you need to is to have something like NFS. So, one server will access the folder locally and other one will access it remotely via NFS. I don't think NFS is that slow especially if the two servers are adjacent to each other within the same subnet.

Another option is to use a replication on the disk level such as DRBD.

Using such solutions, you can eliminate the need for manually syncing the changes in both ways.

Khaled
  • 36,533
  • 8
  • 72
  • 99
  • Such solutions may not be suitable if the network delay is high! – Khaled Jan 26 '12 at 11:53
  • wouldn't disk I/O wait issues apply to `NFS` as well when both servers `read/write` at the same time? – Max Jan 26 '12 at 11:54
  • Are the writes too frequent? You will not know how much it will be good or bad unless you do some testing or benchmarking. – Khaled Jan 26 '12 at 12:06
  • We benchmarked our Drupal websites. The main bottleneck, and by far, was I/O wait, hence my concern/question. – Max Jan 26 '12 at 12:08
  • I mean benchmarking should be done after choosing a solution for syncing both servers as in the final setup. – Khaled Jan 26 '12 at 12:13
  • I'm not sure a benchmark is useful in this case: if by design `NFS` doesn't support concurrent `read/write` to the same fs, you know I/O wait is still going to be an issue. – Max Jan 26 '12 at 12:38
1

To address both of your problems, use unison. Is based on rsync and is addressing the problem with deleted files.

Sacx
  • 2,581
  • 16
  • 13
  • Thanks for the suggestion. Could you share your general experience with using tool ? (i.e. Do you have experience using `Unison` on large folders (e.g. 4.5GB)? What kind of overhead is there with maintaining the history of deleted/modified files? Are there any particular issues to bear in mind?...) – Max Jan 26 '12 at 12:07
  • I used unison to sync 2 folders between 2 machines and it worked without any problem until we stopped the service. I never had 4G of data, but I had several hundreds of small files (maximum 100MB per file) and it worked flawlessly. I didn't care about overhead. But because is using rsync at it's base it will not have problems with big files. – Sacx Jan 26 '12 at 12:12
  • `Sacx`, can you please look at the update in my question? I've tested `Unison` and as far as I can see it doesn't handle the `deleted <-> updated` situation well. – Max Jan 27 '12 at 07:28
  • This is a "merge situation". Did you tried to activate backups ? – Sacx Jan 27 '12 at 08:11
  • I'm trying to do so, but I found the documentation to be very confusing. Do you happen do have a sample command you could share? – Max Jan 27 '12 at 13:59
1

We're using Gluster in a similar scenario (Ruby apps, not Drupal). In your case, both machines would be both gluster servers and clients. The Drupal installation would point at share as seen by the client configuration. File operations on the share are propagated throughout the cluster, and you should be resilient to one node failing.

Yeah, gluster i/o performance will not be as good as something more physical, but I'm not sure how you will be able to get around that given your setup.

cjc
  • 24,916
  • 3
  • 51
  • 70
  • Thanks for the suggestion. With `Option 3` we do address I/O performance because each server `read/write` to a separate disk. You could have background jobs doing the synching of files every once in a while (the question being how often?). – Max Jan 26 '12 at 12:41
  • I don't know anything about Unison. I suggested gluster earlier. – cjc Jan 26 '12 at 23:36
1

As @Khaled touched on briefly, you could use DRBD for this.

Each node has its own local copy of the data (so reads are as fast as your local disk); you can configure it so that writes block until the data has been written to disk on the other nodes (so there's latency that depends upon the speed of your network, but all clients will see a consistent view of the files).

nickgrim
  • 4,466
  • 1
  • 19
  • 28