0

How to replicate /var/www/html/ between two separate machines which has separate hard disk storage running Apache web servers. I was planning to setup load balancing between these two Apache servers and configured Nginx as front end for those two Apache servers, the files should sync from server 1 to server 2 and vice-versa

Greg Askew
  • 35,880
  • 5
  • 54
  • 82
Cj Walter
  • 33
  • 2
  • 7

2 Answers2

1

Multiple options / solutions come to my mind:

  • NFS:

    Network File System (NFS) is a distributed file system protocol originally developed by Sun Microsystems in 1984,[1] allowing a user on a client computer to access files over a computer network much like local storage is accessed.

  • DRBD aka Distributed Replicated Block Device:

    DRBD® software is a distributed replicated storage system for the Linux platform. It is implemented as several userspace management applications and some shell scripts and is normally used on high availability (HA) computer clusters.

  • csync2:

    Csync2 is a cluster synchronization tool. It can be used to keep files on multiple hosts in a cluster in sync. Csync2 can handle complex setups with much more than just 2 hosts, handle file deletions and can detect conflicts.

There are other options available as well, just take this as a starting point.

gxx
  • 5,591
  • 2
  • 22
  • 42
1

Having code and content managed, promoted, and deployed to all servers is a far better solution to setting up a 2-way-sync. Consider deploying server assets to all servers in your pool, rather than updating on one server, and counting on a sync/replication, or managing a sharing filesystem.

This allows you to keep your content/code managed separately, so you can review/manage/roll-back changes. An unintenional edit or delete will nor harm you.

This will also allow you to spin up additional nodes and help with deploying via a repeatable process.

The simplest iteration of this is to have your web servers periodically do a git pull of a particular branch of your repo.

In an interation of this, do the git pull to a staging location, and rsync to your httphome directory. This will allow you to exclude things like your repo's .git directory and README files.

For example:

rsync -art --delete --delete-excluded --exclude='.git/' --exclude='README*' #{repo_path}/ #{site_path}

As far as using NGINX to front Apache, WHY? What are you hoping to gain from that? They're both web servers who can do reverse-proxy and run applications. Unless you have a really good reason, pick a webserver, and let the loadbalancer balance load.

gWaldo
  • 11,957
  • 8
  • 42
  • 69
  • This works, as long as the files don't dynamically change, for example via uploads triggered by random users. +1 – gxx Feb 05 '16 at 15:04
  • 1
    Sorry, I had a bit of a cold sweat there... Ideally, you could get people to stop logging into servers and use a code management process. (It doesn't have to be git; any source control is better than none.) Setting the expectation that servers can go away or be redeployed at any time might be beneficial. – gWaldo Feb 05 '16 at 15:43
  • It really depends on the specific setup(s) and what you're trying to achieve. Just one example: I'm operating multiple tileservers, requests from clients get round-robin to these. If a client requests a tile which wasn't rendered before, it's rendered and after this, as fast as possible, should be available on all other tileservers as well (to minimize the risk that it's getting rendered again and to prevent costly database queries, which aren't necessary). `csync2` or `DRBD` are great for this, because the transfer is nearly instantaneous, ... – gxx Feb 06 '16 at 11:30
  • ...(continued) doing something like `git add && git commit && git push` and a `git pull` (or using some different `vcs`) afterwards sounds just crazy: How do you notify the servers to do a `git pull`? Doing it regularly? I think, this approach for this use-case doesn't work well, because it doesn't scale: It's to slow, at least `git` isn't fast at something like ~ 250GB of data. All in all, even if this is quite old, but I think it's still valid: _Use the right tool(s) for the job_. – gxx Feb 06 '16 at 11:36