4

I am tunnelling into a remote server to access a private api. I am using the following command (by typing it into Terminal) to forward a single port: ssh -L $local_port:$host:$port -v. Tried many different local ports (9000, 9001 etc).

I can use the api directly from browser or proxied over nginx for hours without issue. If access it from a playframework app, the ssh session prints "select: Invalid argument" and closes the connection.

Here is what the log looks when it happens:

debug1: channel 1016: free: direct-tcpip: listening port 9001 for ec2-xxx.compute-1.amazonaws.com port 9000, connect from 127.0.0.1 port 65349, nchannels 1
select: Invalid argument
Connection to ec2-xxx.compute-1.amazonaws.com closed.
Transferred: sent 243904, received 64728 bytes, in 87.8 seconds
Bytes per second: sent 2778.0, received 737.2
debug1: Exit status -1

The play app is using AsyncHttpClient to make GET requests localhost:$local_port.

Is there a fix to this on the connection side? e.g. make ssh ignore the error and continue connected?

Johnny Everson
  • 115
  • 1
  • 7
  • 1
    Are you running the `ssh` command interactively or in a script? Are you forwarding a single port or multiple ones? Could you give the actual port numbers? What is the full ssh output trace you get with the `-v`? (or at least a few previous lines) – Lætitia Jan 20 '14 at 14:10
  • Responded by adding details to the question. Summary ( interactive, single, tried many ports e.g. 9001, log added to question – Johnny Everson Jan 20 '14 at 14:26
  • 1
    Does the system log on the remote server indicate anything? Likely in /var/log/syslog I'd imagine. Also, use -v -v -v on your local ssh command line for more detail. – etherfish Jan 20 '14 at 14:53

2 Answers2

4

I just noticed your debug output there says channel 1016. I suspect you're running out of filedescriptors. I just checked on my linux laptop, and ulimit -a shows me a maximum of 1024. So, I imagine you're hitting that too. The ideal solution is to figure out how you're using so many simultaneous filedescriptors and reduce that somehow.

The alternative, as follows, is to simply raise the maximum number of open filedescriptors:

 (useful debug commands here)
 lsof -p <pid>    should show you the open filedescriptors in use by a process, btw.
 ulimit -a     should show you your soft limits in this specific shell
 ulimit -aH    should show you the hard limits in this specific shell
 cat /proc/<pid>/limits     should show you the limits in effect on a specific process.

To up the fd limits, you'll need to edit /etc/security/limits.conf on both machines to include the following lines:

*       hard    nofile    4096
*       soft    nofile    4096
root    hard    nofile    4096
root    soft    nofile    4096

These freshly reconfigured limits will only take effect when you log in again. I often ssh to localhost to test that type of stuff, but if I were you, I'd do a reboot if possible.

If you can't reboot your remote machine, then at least I'd restart sshd. Before you ssh again, use ulimit -a to confirm your maximum open files or file descriptors is 4096.

When you ssh into the remote machine run ulimit -n there to verify that it says 4096.

Good luck.

etherfish
  • 1,757
  • 10
  • 12
  • Locally, open file limits are high. On server is 1024. That might be the cause, I will investigate if that's the issue. – Johnny Everson Jan 20 '14 at 18:58
  • 1
    I looked into the situation further. select(2) should only return -EINVAL if the timeout value is negative or if nfds >= FD_SETSIZE. Turns out, on linux, FD_SETSIZE is defined to a maximum of 1024. So, even raising the ulimit won't help. I'm afraid I think you'll need to find some way of reducing the number of simultaneous TCP connections. Sorry I couldn't be more help. – etherfish Jan 20 '14 at 22:54
  • 1
    @etherfish, *facepalm*. excerpt from manual: "EINVAL nfds is negative or the value contained within timeout is invalid.". Yours: "if the timeout value is negative or if nfds >= FD_SETSIZE". Holly crap… – poige Jan 21 '14 at 15:33
  • Yeah, poige; I mistook the openbsd-compat/bsd-poll.c wrapper in the openssh source as the the actual system call in use. That wrapper will fail if nfds exceeds FD_SETSIZE whereas the proper core_sys_select() in fs/select.c of the kernel just clamps the nfds to max_fds. Nevertheless, I have confidence in my hypothesis. Thanks for pointing that out! – etherfish Jan 21 '14 at 16:13
  • You can raise FD_SETSIZE. It just takes a few tiny header changes. Ideally, any code that still uses `select` would be changed not to. – David Schwartz Jan 22 '14 at 00:57
0

The previous answers are more likely to be correct, but if increasing the number of filehandles doesn't help, you may want to check your login files (.bashrc, .bash_profile, .login, /etc/login etc) for improper use of the built in bash function "select" - see man bash for more information.

Some Linux Nerd
  • 3,327
  • 3
  • 19
  • 22