0

This will not be a short question - but please bear with me :)

Back in 2012 I've set up svn synchronization from one of our datacenters in US to multiple mirrors around the world (3 to be exact). It generally worked without any problems all these years (it hosts test data, it's currently 1.2TB @ revision 573453).

Recently I needed to rebuild the master server performing the synchronization (without touching the mirrors) and here's where the problems started... Sometimes, for bigger revisions only, the synchronization from the new master to remote mirrors fails - we have 3 such mirrors and sometimes it fails to all of them and sometimes to just one or two... It never fails to 3 other mirrors within the same datacenter (svn versions 1.10.3, and 2x 1.7.5 (this will be important below ;) )).

When I did an strace svnsync... it looked like the svnsync process got a "Connection reset by peer" error when reading some data way into the synchronization of the "bigger/problematic" revision and then it exited after closing the http connections (it always proceeds for a good while before the failure occurs).

I was not able to find anything obvious in any of the logs (not that I'm too competent in this area ;) - managing these is a 4th level side job).

SVN is served with apache mod_svn on all systems - it fails the same regardless if I use http or https as the source protocol - the targets are all http atp.

Once this fails it will fail time and again (to 1,2 or 3 mirrors) and you can attempt it dozens of times and it will fail more or less at the same place (not exactly but close) but...

Here's the kicker...

It will always work, from execution nr 1 and without any fail so far, when the old master is used as a middleman by executing the exact same svnsync command on it...

The old server uses:

bash-4.1$ svnsync --version
svnsync, version 1.7.5 (r1336830)
   compiled May 15 2012, 17:55:12

The new server has:

bash-4.4$ svnsync --version
svnsync, version 1.10.2 (r1835932)
   compiled Feb 10 2021, 09:25:28 on x86_64-redhat-linux-gnu

Could using the older svnsync process cause a different protocol/mode to be selected for the synchronization between the same two repositories that does not have some problem that exists when a matching svnsync is used to synchronize to an "older" repository?

svnsync 1.10.2 SYNC REMOTE FROM 1.10.2 TO 1.7.5 == FAIL (sometimes after a while/longer period of time pushing data)
svnsync 1.7.5 SYNC REMOTE FROM 1.10.2 TO 1.7.5 == SUCCESS (each and every time)

What logs could I check and what logging can I enable to try and narrow it down?

ps. Since this is a "corporate" environment I tried to make sure no "strange network things" ("transparent caches and accelerators and such") are enabled/active and the network teams think they are not ;)

pps. All these servers are VMs run on one or another vmware solution.

RnR
  • 2,096
  • 1
  • 15
  • 23

1 Answers1

0

I want to give several recommendations first:

  1. You need to check the Apache HTTP Server's logs on all your servers and see what errors were logged at the time of sync failures.
  2. You need to check the http.conf and other relevant configuration files of your Apache HTTP Server.
  3. You need to consider upgrading your SVN and Apache HTTP Servers to the most recent version. SVN 1.7.x and 1.10.x are outdated and not supported.
  4. You need to find out if any firewall or proxy interrupts the svnsync sync operations. An antivirus application can result in this problem, too.

I'm a support engineer with VisualSVN Team and I think that you could consider switching to VisualSVN Server and VDFS. VDFS provides SVN repository mirroring without these downsides of svnsync.

Now answers to your questions.

Could using the older svnsync process cause a different protocol/mode to be selected for the synchronization between the same two repositories that does not have some problem that exists when a matching svnsync is used to synchronize to an "older" repository?

svnsync is a client program and it appears that you use are using an outdated version now. Normally, there should be no problems when you use different Subversion client and server versions. However, right now you are using a client to connect two servers via HTTP(S) to replicate your repositories and this can be complicated. If you have antivirus, firewall or a proxy - all of them can cause the behaviour you describe in your question. Problems with your network can cause this problem, too. And "yes", 1.7 and 1.10 clients can behave differently on network when they connect to different versions of the SVN+Apache server.

What logs could I check and what logging can I enable to try and narrow it down?

Check the errors produced by svnsync and see the errors in Apache HTTP Server's logs from this time. You'll see several events.

A general recommendation that I think could help you solve the problem.

You normally run svnsync sync on one of two servers involved in the replication (source or target, master or slave). And the URL to a local repository has to use the file:// direct local access protocol. Local URL means the file:// URL to a repository on the server's disk.

When one of the URLs is local, svnsync does not need to contact both servers remotely via HTTP(S) and one of the repositories is always accessed directly on disk.

Here is a syntax example from SVNBook:

svnsync synchronize DEST_URL [SOURCE_URL]

The target or source repository URL has to be local:

  • If you run this command on there source server, then the SOURCE_URL has to be local:

    svnsync sync "https://svn1.example.com/svn/MyRepo" "file:///C:/Repositories/MyRepo"
    
  • If you run this command on the target server then the DEST_URL has to be local:

    svnsync sync "file:///C:/Repositories/MyRepo" "https://svn1.example.com/svn/MyRepo" 
    
bahrep
  • 29,961
  • 12
  • 103
  • 150
  • Thanks for your detailed reply. The idea to use local paths is interesting as the strace seemed to indicate a problem on the "source/local" repo which could be avoided this way, I will give it a try when it shows up again. 1.10.2 is the latest available in the internal repository - I'll check with IT as to why and see if we can update. The interesting thing here is that using the older svnsync "solves" the problem - any idea which log exactly I could check? I also didn't see any "verbose" parameter to svn sync etc (where can I look for it's more detailed error logs?) – RnR Nov 22 '21 at 07:42
  • Good news! Using a local path as the source url does *workaround* my problem (thanks for this suggestion!) and allowed me to synchronize immediately after using http url failed 3x - so to me it looks like this svn version or http/apache setup definitely has a bug but thankfully in the area we can bypass using the filesystem directly :) – RnR Nov 22 '21 at 12:21
  • @RnR it's good to know that it worked. :) The logs I mentioned are the logs of your apache http server (see https://www.visualsvn.com/support/svnbook/serverconfig/httpd/#svn.serverconfig.httpd.extra.logging). BTW, you could upvote and mark my answer as accepted since it helped solve the problem. – bahrep Nov 22 '21 at 14:07
  • I will in a few days (want to give someone a chance to come up with an actual explanation of what the root of the problem was (ie. that connection Reset by Peer on read looked like a bug or miss configuration). Thanks again – RnR Nov 24 '21 at 09:27
  • @RnR the actual problem is a race condition when using `svnsync` to sync from one repository to another via HTTP(S). Using file:/// local access protocol for the local server solves this problem. This is not a bug by itself, but a misconfiguration. You are also using an outdated version of SVN and maybe other components. – bahrep Dec 02 '21 at 10:41
  • There was certainly no race condition - this was handled properly by other means. – RnR Mar 03 '22 at 12:16
  • @RnR this is what I meant by "race condition": 1. The destination repository receives a transaction and has to commit it. This operation can take some time. 2. svnsync keeps the connection to both source and destination repositories over HTTP(S) and waits for a response from the destination server. The transaction is still being committed. 3. The source server closes the connection due to a timeout because svnsync was still waiting for the response from the destination server. – bahrep Nov 14 '22 at 15:32