-1

For performance issue and duration optimization, i want to know who limits my number of SSH connections.

A BASH script is calling X perl scripts. Each perl scripts spawn a new SSH connection towards a different IP.

So, this is how it works :

max_proc_ssh=400

while read codesite ip operateur hostname
do 
    (sleep 3; /usr/bin/perl $DIR/RTR-sshscript.pl $codesite $ip $operateur $hostname) &
        ((current_proc_ssh++))
    if [ $current_proc_ssh -eq $max_proc_ssh ]; then
        printf "Pausing with $max_proc_ssh processes...\n"
        current_proc_ssh=0
        wait
    fi
done<<<"$temp_info"

And each RTR-sshscript.pl spawns a new Expect with a SSH connection and send a lot of commands, the duration is about 3minutes

$exp->spawn("ssh -o ConnectTimeout=$connectTimeout $user\@$ip") or die ("unable to spawn \n");

So, with max_proc_ssh=200 i have no issue. Scripts are going well. But when i'm going with max_proc_ssh=400, the Expect module cannot handle it. It sometimes tells me **unable to spawn** I would say that, from the 400 expected, only 350 really starts, something like that.

What is wrong with this ? i am trying to define a sublimit to avoid launching 400 expects at the same time, something like :

max_proc_ssh=400
max_sublimit_ssh=200

while read codesite ip operateur hostname
do 
    (sleep 3; /usr/bin/perl $DIR/RTR-sshscript.pl $codesite $ip $operateur $hostname) &
    ((current_proc_ssh++))
    ((current_sublimit_ssh++))
    if [ $current_sublimit_ssh -eq $max_sublimit_ssh ]; then
        printf "Pausing sublimit SSH reached..."
        sleep 3
        current_sublimit_ssh=0
    fi
    if [ $current_proc_ssh -eq $max_proc_ssh ]; then
        printf "Pausing with $max_proc_ssh processes...\n"
        current_proc_ssh=0
        current_sublimit_ssh=0
        wait
    fi
done<<<"$temp_info"

This would allow SSH to launch 200 Expect, then waits 3 secondes before launching 200 again. And then, wait for all 400 to finish before starting again.

EDIT : As described in the comment section, i added "$!" to the error message and then i have this :

./../../../scripts/mynet/RTR-scan-snmp.sh: fork: retry: Resource temporarily unavailable
./../../../scripts/mynet/RTR-scan-snmp.sh: fork: retry: Resource temporarily unavailable
./../../../scripts/mynet/RTR-scan-snmp.sh: fork: retry: Resource temporarily unavailable
./../../../scripts/mynet/RTR-scan-snmp.sh: fork: retry: Resource temporarily unavailable
./../../../scripts/mynet/RTR-scan-snmp.sh: fork: retry: Resource temporarily unavailable
./../../../scripts/mynet/RTR-scan-snmp.sh: fork: retry: Resource temporarily unavailable
./../../../scripts/mynet/RTR-scan-snmp.sh: fork: retry: Resource temporarily unavailable

What does that mean ? I am overwhelming the fork limit ? How can I increase it ? By modifying the sysctl.conf file ?

When searching a little by myself, they say check what

sysctl fs.file-nr

is saying But when i start the script, it doesn't go higher than this :

 sysctl fs.file-nr
fs.file-nr = 27904      0       793776

the ulimit for my user is 4096 But when the script starts, the counter goes way higher than this :

 sudo lsof -u restools 2>/dev/null | wc -l
25258
pynexj
  • 19,215
  • 5
  • 38
  • 56
Gui O
  • 363
  • 4
  • 8
  • 22
  • "unable to spawn" is not an ssh connection issue I don't think. That sounds more like a limit preventing `ssh` from even starting. Look at your `ulimit`s? Look at your max open files? – Etan Reisner Dec 05 '14 at 13:56
  • Unable to spawn is a message that i set when expect DIE $exp->spawn("ssh -o ConnectTimeout=$connectTimeout $user\@$ip") or die ("unable to spawn \n"); (see above :) ) – Gui O Dec 05 '14 at 13:57
  • Good point. So include the actual error in that message and see what was actually failing? – Etan Reisner Dec 05 '14 at 14:00
  • How can i include the error in this message that i set ? with $exp->error ? – Gui O Dec 05 '14 at 14:03
  • At a guess `die ("unable to spawn - $!\n");` – Etan Reisner Dec 05 '14 at 14:07
  • Well, I added the $exp->error but i also added the solution i described above. Expect doesn't seem to show me any error launching 200 by 200 for a total of 400 threads. I guess this was because 400 at the same time was too much... – Gui O Dec 05 '14 at 14:19
  • What error do you get from the original script from `$!` when it fails? – Etan Reisner Dec 05 '14 at 14:23
  • Nevermind, it still have errors... $exp->error just show me "0 at RTR-sshscript.pl line 78" I will launch it again with your $! in a few minutes... – Gui O Dec 05 '14 at 14:47
  • I think `$exp->error` is for errors from an `$exp->expect` call. – Etan Reisner Dec 05 '14 at 14:51
  • Try `sysctl -a | grep -i maxproc` – Mark Setchell Dec 08 '14 at 14:47
  • Why don't you just use GNU Parallel and do it very simply and efficiently? – Mark Setchell Dec 08 '14 at 14:49
  • Your command doesn't return anything. For some company issue. I'm working in a big company, and they have "a packaged linux version" installed on every new Linux server. But GNU Parallel is not in this package. So everytime the scripts will move to another server, somebody will have to reinstall the package... – Gui O Dec 08 '14 at 14:54

1 Answers1

0

It appears that it's not a process limitation, but the opened-file limitation.

Adding these line :

restools         soft    nproc           2048
restools         soft    nofile          2048

to /etc/security/limits.conf file solved the issue ! The first line limits the number of active process to 2048 And the second, the number of opened files to 2048 Both of them was previously 1024

Tested, and approved

Gui O
  • 363
  • 4
  • 8
  • 22