We have 2 Drupal servers that read/write
to their own copy of the same folder (the sites/default/files
folder for those of you who know a bit about Drupal
). Those 2 folders should be in sync. I've been looking into some options and here is what I found out:
OPTION 1: Rsync both ways : not an option
You would need to run rsync
both ways because both folders get modified. As long as files are just modified everything works fine because you can use the -u
flag which checks update time and only modifies if source is more recent than destination. However because rsync
doesn't keep a history of files that are being removed and when, rsync
wouldn't know what to do with files deleted on 1 side as to whether they should be kept on the other side because updated more recently or thrown away as well.
OPTION 2: Network share: OK, but I/O wait performance issue
One option would be to setup a network share, removing the needs for synching. The downside is I/O wait as both servers would read/write
on the same disk.
OPTION 3: 3rd server with master copy: OK, but potential performance/race condition issues
Another option would be to have a 3rd sever keeping a master copy of the folder. Whenever a change is made on one Drupal server, the Drupal server folder would be rsync
'ed to the master copy, circumventing the issue raised in option 1. For this to work however you would need changes to be synched to the master copy in order of occurrence on the Drupal servers, raising the following problems:
-P1: if you sync to master for every change made and changes are frequent you servers could get quiet busy with the synching process
-P2: even if you start the synching jobs in order, due to various elements (execution speed of process, network delays...) you have no guarantee the files will end up being synched in order on the master copy.
Q1: How to you address problems P1 and P2?
Q2: Are there any other approaches to keeping 2 remote folders in sync?
Additional info:
server OS: Ubuntu server 10.04 LTS
Drupal v: Drupal 6.X
Size of sites/default/files: 4.5G
Update 1: testing of Unison
I tested Unison
and it doesn't work as I expected regarding deleted files:
[1] Setting up the directories
FOLDER1 FOLDER2
file1 (new) (empty)
[2] Running Unison (unison FOLDER1 FOLDER2
)
FOLDER1 FOLDER2
file ----> file1
=> file1 gets copied from FOLDER1 to FOLDER2
[3] Updating the directories
FOLDER1 FOLDER2
file1 (removed) file1 (modified)
[4] Running Unison again (unison FOLDER1 FOLDER2
)
FOLDER1 FOLDER2
deleted <-?-> changed file1 []
No default command [type '?' for help]
At this point Unison
doesn't know whether it should delete file1
from FOLDER2
or copy it to FOLDER1
. I would expect Unison
to do the latter as:
-at [2] we know the last modify/access times of file1
in both folders and these get copied in the Unison
archives.
-at [4] we see file1
is missing from FOLDER1
so time taken into account for the removal should be last available time in archive (i.e. time obtained at [2]).
-at [4] we also see that last modify/access time of file1
in FOLDER2
is greater than [2] for FOLDER1
, so file1
should be copied from FOLDER2
to FOLDER1
.
I've been trying different switches such as -auto
(automatically accept default actions) and -batch
(batch mode: ask no questions at all), but still, Unison
can't make that decision by itself.
Q: Is there a way to get Unison
or another tool to perform according to the behaviour I describe?