8

I have a shell script which keeps on copying huge files (2 GB to 5 GB) between remote systems. Key based authentication is used with agent-forwarding and everything works. For ex: Say the shell script is running on machine-A and copying files from machine-B to machine-C.

"scp -Cp -i private-key ssh_user@source-IP:source-path ssh_user@destination-IP:destination-path"

Now the problem is the process sshd is continuously taking loads of CPU.
For ex: top -c on destination machine (i.e. machine-C) shows

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND                                                                         
14580 ssh_user  20   0 99336 3064  772 R 85.8  0.0   0:05.39 sshd: ssh_user@notty                                                            
14581 ssh_user  20   0 55164 1984 1460 S  6.0  0.0   0:00.51 scp -p -d -t /home/binary/instances/instance-1/user-2993/

This results in high load average.

I believe scp is taking so much CPU because its encrypting/decrypting data. But I don't need encrypted data-transfer as both machine-B and machine-C are in a LAN.

What other options do I have? I considered 'rsync'. But the rsync man page says:

GENERAL
       Rsync  copies files either to or from a remote host, or locally on the current host (it does not support copying files between two
       remote hosts).

Edit 1: I am already using ssh cipher = arcfour128. Little improvement but that doesn't solve my problem.

Edit 2: There are other binaries (my main application) running on the machines and high load average causing them to perform poorly.

Varun
  • 405
  • 2
  • 4
  • 7
  • 3
    "rsync doesn't support copying data between remote machines" - erm...what makes you think that? that's *exactly* what most people use it for – Chopper3 Apr 30 '12 at 10:32
  • 1
    @Chopper3: IIRC, rsync doesn't support his *very unusual* method of copying with two remote machines. Either source or target has to be local. – Sven Apr 30 '12 at 10:34
  • 1
    @Varun: If you don't need the files to be copied quickly, you can use `-l limit` option to limit the transfer speed. This should lower the CPU usage also. – Khaled Apr 30 '12 at 10:35
  • This is irrelevant anyway, as the usual transport backend of `rsync` is ssh, the same as with `scp`. – Sven Apr 30 '12 at 10:36
  • 2
    @Chopper3: The 'rsync' man page says that :) – Varun Apr 30 '12 at 11:12
  • I have modified my question and quoted what the man page says. – Varun Apr 30 '12 at 11:14
  • "This results in high load average." - so what. If you said it was affecting performance elsewhere then it would be worth worrying about, but making your system metrics look nice is not a basis for tuning a system. BTW yes, as mulaz says, it's easy to pass the data via other means, but this may actually be more work for the TCP stack to push more packets across the network. You could still use nc and gzip/gunzip but you'll probably find little difference in the impact compared with scp -C - the encryption part does not require a lot of effort. – symcbean Apr 30 '12 at 11:51
  • @symcbean: I have modified the question to address your concern. There are multiple other services running on those machines and because of high load average they are performing very poorly. – Varun Apr 30 '12 at 12:19

8 Answers8

11

This problem can be solved with rsync. At least this solution should be competitive in terms of performance.

First, rsync can be called from one of the remote systems to overcome the limitation in the inability to copy between two remote systems directly.

Second, encryption/decryption can be avoided by running rsync in Daemon Access mode instead of Remote Shell Access mode.

In daemon access mode rsync does not tunnel the traffic through an ssh connection. Instead it uses its own protocol on top of TCP.

Normally you run rsync daemon from inet.d or stand-alone. Anyway this requires root access to one of the remote systems. Assuming root access is not available, it is still possible to start up the daemon.

Start rsync daemon as a non-privileged user on the destination machine

ssh -i private_key ssh_user@destination-IP \
       "echo -e 'pid file = /tmp/rsyncd.pid\nport = 1873' > /tmp/rsyncd.conf

ssh -i private_key ssh_user@destination-IP \
       rsync --config=/tmp/rsyncd.conf --daemon

Actually copy the files

ssh -i private_key ssh_user@source_ip \
       "rsync [OPTIONS] source-path \
              rsync://ssh_user@destination-IP:1873:destination-path"
Dima Chubarov
  • 2,316
  • 1
  • 17
  • 28
  • 1
    I am selecting this as correct answer. The 'netcat' solution given by @mulaz is also good but rsync gives many more options like preserving permissions, timestamps etc. Thanks. – Varun May 04 '12 at 14:44
  • 1
    Though this probably used to work, it seems the syntax has changed over the years. You now need to configure a module in the rsyncd.conf file pointing to a folder, set it to `read only = no`, and then the rsync command syntax also has to change to `rsync [OPTIONS] source-path destination-ip:port_nr:modulename/path`. If you use the default port of `873`, `port_nr` can be blank, leaving `destination-ip::modulename/path`. – zaTricky Jun 26 '23 at 22:55
8

The least-overhad solution would be using netcat:

destination$ nc -l -p 12345 > /path/destinationfile
source$ cat /path/sourcfile | nc desti.nation.ip.address 12345

(some netcat version do not need the "-p" flag for port)

All this does is send the unencrypted data, unauthenticated over the network from one pc to the other. Of course it is not the most "comfortable" way to do it.

Other alternatives would be trying to change the ssh cipher (ssh -c), or using ftp.

PS: rsync works fine with remote machines, but it is mostly used in combination with ssh, so no speedup here.

mulaz
  • 10,682
  • 1
  • 31
  • 37
3

If encryption isn't a concern, throw up an NFS daemon on C and mount the directory on B. Use rsync run on B, but specify the local directory paths.

Ignoring whatever your use case for involving A is, just prepend ssh user@B rsync... to the command.

Transfers data without encryption overhead and only transfers the different files.

Also, FTP was built with 3rd party server-to-server transfers as a protocol feature.

Jeff Ferland
  • 20,547
  • 2
  • 62
  • 85
1

You can use a low crypting method : you can use rsync --rsh="ssh -c arcfour" to increase the speed. I my tests, I am waiting disks and no more the network connection. And use rsync, it is good !

Dom
  • 6,743
  • 1
  • 20
  • 24
0

https://github.com/zgiles/ptar is worth a look.

It is easy to compile on Linux and Windows.

For Windows users - compile using

go build -ldflags "-w -extldflags -static -X main.version=1" github.com/zgiles/ptar/cmd/ptar

No make required!

If transferring FROM Windows TO Linux, I recommend a small change to ptar.go - on line 221,

change

hdr.Name = i

hdr.Name = filepath.ToSlash(i)

Then run it against a directory (customize your threads to taste and size and resources):

ptar.exe --create --threads=16 --debug --verbose --file=output bss-testing-windows.maplarge.net

This will output 16 files in the working directory. Transfer them using scp or even over HTTP like with axel -n 20 which works awesome.

Then to recombine, use this script:

mkdir target || true
mkdir finished || true

for i in {0..15}
do
  axel https://bss-testing-windows.maplarge.net/temp/output.${i}.tar -a --output=target/output.${i}.tar
done

cd target

for i in {0..15}
do
  tar -xf output.${i}.tar
  mv output.${i}.tar ../finished
done

This will save you the step of normalizing the path names that you would have to do otherwise, like this:

(NOTE you do not have to do this if you use the change suggested above)

# for i in *; do new=${i//\\/\/}; newd=$(dirname "$new"); mkdir -p "$newd"; mv "$i" "$new"; done

tacos_tacos_tacos
  • 3,250
  • 18
  • 63
  • 100
0

Try out unison. Is the best option for synchronizing files.

jordiv
  • 1
  • 2
0

Maybe you find http://rightsock.com/~kjw/Ramblings/tar_v_cpio.html this interesting.

It parallelizes data transfers between two hosts. Pay particular attention to point nr. 5, and adapt accordingly to your needs.

Luis
  • 283
  • 5
  • 10
0

I know this would need a little bit of work, but would DRDB work for you? It's like a network-based RAID and keeping two servers in sync is much easier with it if the case is similar to yours, at least if you just need server A to copy to server B and not also from B to A all the time.

Janne Pikkarainen
  • 31,852
  • 4
  • 58
  • 81