Why using double backslashes in sed for running ssh?

Question

I ran across the following in Gentoo Linux's wiki about dynamic jumphost list:

    ProxyCommand ssh $(echo %h | sed 's/+[^+]*$//;s/\([^+%%]*\)%%\([^+]*\)$/\2 -l \1/;s/:/ -p /') nc -w1 $(echo %h | sed 's/^.*+//;/:/!s/$/ %p/;s/:/ /')

It works, but I would like to understand the sed expression completely.

Reading its original reference, I was able to get a good understanding of the recursive invocation of the command, using the Host *+* pattern. But I have two questions:

The expression uses %%. To see why, I used ssh -v, and observed that when the ssh client parses the $HOME/.ssh/config, it seemed that the first % is stripped. Attempting to confirm the above, I downloaded the openssh source codes, but the readconf.c didn't give me a clue. I am new to OpenSSH source codes, but am not afraid to compile it with debug info, and gdb it. Nevertheless, if there is a quicker way to confirm my conjecture, I would appreciate a hint.

The ssh -v also revealed that:

[...]
debug1: Executing proxy command: exec ssh $(echo zackp%node0+zackp%node1+node3 | sed 's/+[^+]*$//;s/\\([^+%]*\\)%\\([^+]*\\)$/\\2 -l \\1/;s/:/ -p /')
[....]

i.e. the \( is now escaped with a \ in the subshell. Why this is necessary?

Thanks,

--Zack

Nicholas Wilson · Accepted Answer · 2012-08-26T12:27:22.970

Good question. It's a pretty tortuous command! It sounds like you've pretty much got it though. On your machine, the host string has one of the plus-separated hops stripped off; for convenience, that token then has any port and user extracted and turned into options (-l and -p). Finally, the information about the other hops is popped into a string to pass to netcat. ssh on your machine makes the one connection, and executes netcat on its target machine with the string containing the information about the remaining hops. The same process then happens again there, and so on, until all the hops are done, with a netcat instance running on each relay to forward the traffic. Pretty tidy bit of command-line fun!

Your specific questions:

Why are the % signs escaped? This is specific to the ProxyCommand option! From the man page regarding ProxyCommand:

In the command string, any occurrence of ‘%h’ will be substituted by the host name to connect, ‘%p’ by the port, and ‘%r’ by the remote user name.

Like all well-behaved unix utilities, when there's a metacharacter going on, the natural thing is to use that character doubled to represent a literal. Otherwise, there's no way to represent certain strings! It was probably just added by the programmer out of neatness, without thinking that someone would write his own mini-syntax for jump lists using % and post it on the Gentoo wiki!

The % codes are specific to this option, so the escaping is probably buried somewhere near where the option is handled in the OpenSSH source.
Fiddly question! The string specified as the ProxyCommand option isn't a command string that will be passed to ssh directly; it's specifically executed "using the user's shell". So, what goes in the option is meant to be user-friendly so you can type into your ssh.conf what you'd type in your shell.

Now, most people (including me!) aren't too fussed about 100% precise logging, but the OpenBSD guys have a strnvis function that OpenSSH passes over all log strings before outputting them. It encodes control characters and other nasties so that the log output gives a readable record of the precise (null-free) buffer passed in by the string logging functions. This is great, but the only trick is that when reading the logs, you have to 'strunvis' it back to its original form.

Basically, the backslash is an oddity of their logging format. It isn't passed to the shell.

_{Now, I'm guessing here (I don't think it's worth delving too deeply!), but basically the question's about the output of the logging ssh spits out when it's being verbose. I've written logging for process launching before, and it's a bit of a sloppy art, given how complicated arguments can be (embedded newlines? trailing whitespace? crazy quotes?). You don't often need a 100% "accurate" way of logging losslessly the arguments to exec, because it's too tedious. It looks like the author of the OpenSSH code here, when hunting for a single string to log, just spat out the escaped form of the string he had handy for passing as the last argument to sh. It's not a 'perfect' representation of what's going to be exec'ed (because I suspect some whitespace gets lost in logging) and it's perhaps not the most user-friendly thing to log (because it's got more escaping that you typed in!), but it's fine.}

Thanks for answering my Qs so rapidly. Regarding 1. I did find out that in the most recent openbsd `ssh`, line 239 of `auth.c` has this to say, quoted, "Currently, %% becomes '%'", but I wasn't satisfied as it's for a function `char *expand_authorized_keys`. Oddly, so far, my reading of `readconf.c` still has not uncovered anything for me. — user183394, Aug 26 '12 at 00:36
Ha! I found it :-) Regex rules. It's in `misc.c`. The function `char *percent_expand(const char *string, ...)`. The function's comments are very telling. I think that I definitively found the answer of my 1st Q. — user183394, Aug 26 '12 at 00:41
I guess I shouldn't have been lazy and looked at the actual files (it's in /usr/src/crypto/openssh/sshconnect.c on my system). To get back to Q2, how is the command escaped and processed, notice very interestingly how it uses `exec`. That's really neat, actually, and I probably wouldn't have thought of it. It avoids the problem of unintelligent shells: normally, the shell launches children to execute each command line, and waits for them before doing the next line. That's often not needed for the last line (it can exec), but using the exec builtin forces the shell to perform that optimization. — Nicholas Wilson, Aug 26 '12 at 11:08
Gaaarh! It's come to me where that extra blinking backslash is coming from! It's one of those lovely OpenBSD-isms (I'm a FreeBSD user, so I shouldn't moan too much...). They have a delightful function `strnvis` that encodes an arbitrary string losslessly into a printable one (including control characters, etc). ssh runs strnvis over all its log output! The extra backslashes aren't going to the shell at all. Clearly, these guys aren't sloppy and don't find it tedious to make sure their whitespace doesn't get lost in logging! I retract my statements above. Answer ammended. — Nicholas Wilson, Aug 26 '12 at 12:18
your proficiency in BSD pays :-) I haven't come to that far. Agreed, this aspect is fiddly! At any rate, it's been a fun conversation with you. Answer accepted. Many thanks! — user183394, Aug 26 '12 at 16:55

Why using double backslashes in sed for running ssh?

1 Answers1