There are several things to discuss here.
First the failure of the command line, second the details of this line.
Hypotheses regarding the origin of error
As it is, the argument omryg@localhost:/cs/sci/omryg/dummy/bla.txt
does not seem incorrect.
Is it possible that there are invalid non-printable characters in your file transfer.log
?
I am thinking more specifically of wrong markers for newline. It sometimes happens when editing files in Windows (vim
would show a character ^M
at the end of each line for example; this is called the "dos" encoding).
However, the fact that both lines are treated sequentially hints at correctly detected newline characters…
Instead of a one-liner, can you decompose your code as such
sshpass -p "mypass" cat transfer.log > local.log
and examine local.log
? vim
shows non-printable characters, but a more thorough search could involve hexdump
.
hexdump -c local.log
would show the characters, 16 per line. The newline character is represented by \n
. Note that in its vanilla usage, hexdump
"dumps hex", i.e. outputs hexadecimal codes for characters. The option -c
shows representations of the said characters.
If it is OK, you can try further:
cat local.log | parallel …
I could not comment below your question because my account is fresh new. I will wait for your replies and adapt my answer if needed.
Comments on some details of the command
- Security concerns
In your command lines, you explicitly type your password; try creating a key exchange with the server: generate a key pair on your local machine with ssh-keygen
, copy the content of the public key (by default ~/.ssh/id_rsa.pub
) into the remote file ~/.ssh/known_hosts
(create it if it is absent, the bash redirection operator >>
does that, i.e. appending to a file and creating it if absent). More about it here.
As a general comment, your password should only be stored in your brain, never in a script nor in your shell command history. I don't recommend the use of sshpass
at all.
- Use of
parallel
and rsync
Now about the use of parallel
. It is often considered an alternative to explicit loops (while's and for's) by running the iterations in parallel. In your case, you run rsync
, a network transfer command, in parallel. Firstly, rsync
is optimized for sequential transfers and to analyze whole directories. Secondly, independently of the number of cpus you use, you might rather be limited by the total network bandwidth.
Using parallel
may also have a disadvantage: the outputs of your parallelized commands are all mixed and ordered as they are run. It might get difficult to diagnose errors with a lot of items to process.
If you really want to constrain the list of files transferred by rsync
, you might want to look into the option --include-from=list.txt
, where list.txt
is an ASCII file of patterns (so, plain filenames work), one per line. If you are sure you don't want to use patterns, there is a simpler option --files-from=list.txt
. In that case, you only need to pass a directory as the source argument; rsync will take the files from it. More about this option in the man page, and the relevant excerpt has been cited in extenso there.
Finally, if you need to process 2 arguments to run on 2 cpus, you can skip the part -j 2
; it is automatically set in your case.
- Configuring an SSH host
I noticed that you use localhost
and a port number -p 12345
, which seems to indicate a local tunnel. In case you need to type this often, you can complete your local SSH configuration (~/.ssh/config
) with a "shortcut":
host my-proxy
HostName localhost
Port 12345
User omryg
and now your command line simply reads rsync -ave 'ssh' my-proxy:/cs/sci/omryg/dummy ./
. Notice the absence of -p 12345
and of omryg@localhost
.