Parallel rsync from remote server gets unexpected remote arg

Question

I am trying to pull files from a remote server to my local machine using parallel. To test this, I created a directory in my remote server with two files dummy/bla.txt, dummy/bli.txt The following command works

sshpass -p "mypass" rsync -ave "ssh -p 12345" omryg@localhost:/cs/sci/omryg/dummy ./

receiving incremental file list
dummy/
dummy/bla.txt
dummy/bli.txt

sent 66 bytes  received 195 bytes  104.40 bytes/sec
total size is 0  speedup is 0.00

When I try to run with parallel, I first created a file transfer.log with two lines with the file names. Then I ran

sshpass -p "mypass" cat transfer.log | parallel --will-cite -j 2 rsync -ave "ssh -p 12345" omryg@localhost:/cs/sci/omryg/dummy/{} ./

Unexpected remote arg: omryg@localhost:/cs/sci/omryg/dummy/bla.txt
rsync error: syntax or usage error (code 1) at main.c(1354) [sender=3.1.3]
Unexpected remote arg: omryg@localhost:/cs/sci/omryg/dummy/bli.txt
rsync error: syntax or usage error (code 1) at main.c(1354) [sender=3.1.3]

Nicolas Cornuault · Answer 1 · 2020-04-06T13:44:50.470

There are several things to discuss here.

First the failure of the command line, second the details of this line.

Hypotheses regarding the origin of error

As it is, the argument omryg@localhost:/cs/sci/omryg/dummy/bla.txt does not seem incorrect.

Is it possible that there are invalid non-printable characters in your file transfer.log?
I am thinking more specifically of wrong markers for newline. It sometimes happens when editing files in Windows (vim would show a character ^M at the end of each line for example; this is called the "dos" encoding).
However, the fact that both lines are treated sequentially hints at correctly detected newline characters…

Instead of a one-liner, can you decompose your code as such

sshpass -p "mypass" cat transfer.log > local.log

and examine local.log? vim shows non-printable characters, but a more thorough search could involve hexdump.

hexdump -c local.log

would show the characters, 16 per line. The newline character is represented by \n. Note that in its vanilla usage, hexdump "dumps hex", i.e. outputs hexadecimal codes for characters. The option -c shows representations of the said characters.

If it is OK, you can try further:

cat local.log | parallel …

I could not comment below your question because my account is fresh new. I will wait for your replies and adapt my answer if needed.

Comments on some details of the command

Security concerns

In your command lines, you explicitly type your password; try creating a key exchange with the server: generate a key pair on your local machine with ssh-keygen, copy the content of the public key (by default ~/.ssh/id_rsa.pub) into the remote file ~/.ssh/known_hosts (create it if it is absent, the bash redirection operator >> does that, i.e. appending to a file and creating it if absent). More about it here.
As a general comment, your password should only be stored in your brain, never in a script nor in your shell command history. I don't recommend the use of sshpass at all.

Use of parallel and rsync

Now about the use of parallel. It is often considered an alternative to explicit loops (while's and for's) by running the iterations in parallel. In your case, you run rsync, a network transfer command, in parallel. Firstly, rsync is optimized for sequential transfers and to analyze whole directories. Secondly, independently of the number of cpus you use, you might rather be limited by the total network bandwidth.
Using parallel may also have a disadvantage: the outputs of your parallelized commands are all mixed and ordered as they are run. It might get difficult to diagnose errors with a lot of items to process.
If you really want to constrain the list of files transferred by rsync, you might want to look into the option --include-from=list.txt, where list.txt is an ASCII file of patterns (so, plain filenames work), one per line. If you are sure you don't want to use patterns, there is a simpler option --files-from=list.txt. In that case, you only need to pass a directory as the source argument; rsync will take the files from it. More about this option in the man page, and the relevant excerpt has been cited in extenso there.
Finally, if you need to process 2 arguments to run on 2 cpus, you can skip the part -j 2; it is automatically set in your case.

Configuring an SSH host

I noticed that you use localhost and a port number -p 12345, which seems to indicate a local tunnel. In case you need to type this often, you can complete your local SSH configuration (~/.ssh/config) with a "shortcut":

host my-proxy
    HostName localhost
    Port 12345
    User omryg

and now your command line simply reads rsync -ave 'ssh' my-proxy:/cs/sci/omryg/dummy ./. Notice the absence of -p 12345 and of omryg@localhost.

Parallel rsync from remote server gets unexpected remote arg

1 Answers1

Hypotheses regarding the origin of error

Comments on some details of the command