1

I'm currently developing a remote job scheduler on perl. It has to connect via ssh to x servers and execute already defined jobs/jobs groups.

I use Net:SSH2 which is build upon libssh2.

My program usually works fine with like 400/500 servers, but when i try to run the basic uptime command on 1000 servers, one or more of my threads hangs and never finishes, or like 30 minutes after.

It's random : sometimes it finishes on time, sometimes not.

I tracked the problem as coming from this Net::SSH2 command : $in .= $buf while $chan->read( $buf, 10240 );

Here is the full code of the connection :

my $chan = $this->{netssh2}->channel() or die $!;
$chan->blocking(1);
$chan->exec($command);
my ($in,$err,$buf,$buf_err);

$in .= $buf while $chan->read( $buf, 10240 );
$err .= $buf_err while $chan->read( $buf_err, 10240, 1 );

$chan->send_eof;
1 while !$chan->eof;

$chan->wait_closed;

I then downloaded a Net::SSH2 source package and modified the C-perl linking (xs) file. It showed me that the problem comes from this line : count = libssh2_channel_read_ex(ch->channel, XLATEXT, pv_buffer, size);

This command comes with the libssh2 library : http://www.libssh2.org/libssh2_channel_read_ex.html

Sometimes (about 1 in 1000 times) the program enters this read and never leaves. Servers affected are differents most of the time.

Do you have any idea what I should be looking for/checking ? I've been working on this for a few day, I'd like an external advice very much :)

Azryel
  • 71
  • 1
  • 11
  • Unless you post the related parts of your code, including how session and channels objects are created and manipulated it is impossible to help you. – salva Nov 12 '15 at 11:46
  • I added the ssh connection part of the code (I don't think it will help but it might get clearer). I've read that libssh2 is not thread safe, so maybe that's my problem ? I used Net::OpenSSH as a replacement and it works fine, but i like Net::SSH2 more, so if there is any possibility to make it work with Net::SSH2, i'm listening =) – Azryel Nov 16 '15 at 13:07
  • What are you passing as `$command` exactly? `uptime`? your code will block if too much data comes through `stderr`. Include also the code initializing the Net::SSH2 session and establishing the connection. Can you reproduce the issue if you do the calls in sequence from just one thread? – salva Nov 16 '15 at 14:03
  • Also, include the versions of Net::SSH2 and libssh2. You could also enable debugging for libssh2: `$ssh2->debug(1)` (but that needs a libssh2 compiled with debugging support). – salva Nov 16 '15 at 14:07

0 Answers0