0

I am copying the files from machineB and machineC into machineA as I am running my below shell script on machineA.

If the files is not there in machineB then it should be there in machineC for sure so I will try copying the files from machineB first, if it is not there in machineB then I will try copying the same files from machineC.

I am copying the files in parallel using GNU Parallel library and it is working fine. Currently I am copying 10 files in parallel.

Below is my shell script which I have -

#!/bin/bash

export PRIMARY=/test01/primary
export SECONDARY=/test02/secondary
readonly FILERS_LOCATION=(machineB machineC)
export FILERS_LOCATION_1=${FILERS_LOCATION[0]}
export FILERS_LOCATION_2=${FILERS_LOCATION[1]}
PRIMARY_PARTITION=(550 274 2 546 278) # this will have more file numbers
SECONDARY_PARTITION=(1643 1103 1372 1096 1369 1568) # this will have more file numbers

export dir3=/testing/snapshot/20140103

find "$PRIMARY" -mindepth 1 -delete
find "$SECONDARY" -mindepth 1 -delete

do_Copy() {
  el=$1
  PRIMSEC=$2
  scp david@$FILERS_LOCATION_1:$dir3/new_weekly_2014_"$el"_200003_5.data $PRIMSEC/. || scp david@$FILERS_LOCATION_2:$dir3/new_weekly_2014_"$el"_200003_5.data $PRIMSEC/.
}
export -f do_Copy

parallel --retries 10 -j 10 do_Copy {} $PRIMARY ::: "${PRIMARY_PARTITION[@]}" &
parallel --retries 10 -j 10 do_Copy {} $SECONDARY ::: "${SECONDARY_PARTITION[@]}" &
wait    

echo "All files copied."

Problem Statement:-

With the above script at some point I am getting this exception -

ssh_exchange_identification: Connection closed by remote host
ssh_exchange_identification: Connection closed by remote host
ssh_exchange_identification: Connection closed by remote host

And I guess the error is typically caused by too many ssh/scp starting at the same time. That leads me to believe /etc/ssh/sshd_config:MaxStartups and MaxSessions is set too low.

But my question is on which server it is pretty low? machineB and machineC or machineA? And on what machines I need to increase the number?

On machineA this is what I can find -

root@machineA:/home/david# grep MaxStartups /etc/ssh/sshd_config
#MaxStartups 10:30:60

root@machineA:/home/david# grep MaxSessions /etc/ssh/sshd_config

And on machineB and machineC this is what I can find -

[root@machineB ~]$ grep MaxStartups /etc/ssh/sshd_config
#MaxStartups 10

[root@machineB ~]$ grep MaxSessions /etc/ssh/sshd_config
#MaxSessions 10
arsenal
  • 217
  • 1
  • 4
  • 11

1 Answers1

0

i do not think load of 10 parallel connection is high for ssh. i assume you have passwordless access, check if there is a key issue

for i in `echo MachineA MachineB MachineC`
   do 
    echo testing $i
    ssh -v $i exit
  done

Check /etc/hosts.deny and /etc/hosts.allow on MachineB and MachineC & see if connection from MachineA are allowed

akash
  • 333
  • 1
  • 10
  • In my case all those configurations are commented out as shown above right? MaxStartups one? – arsenal Jun 11 '14 at 16:06
  • what i asked to check is not under sshd_config , please check if you have a problem logging onto either host from MachineA & if either hos the server's /etc/hosts.allow and hosts.deny block incoming traffic from MachineA – akash Jun 11 '14 at 16:11
  • Those two files looks good and I don't see any problem which can cause this problem. Also I don't see machineA ipaddress being blocked over there. deny file is empty nothing is there and hosts.allow is also empty. – arsenal Jun 11 '14 at 16:28
  • again is the passwordless access setup correctly ? can you try ssh -v from MachineA to MachineB and MachineC. – akash Jun 11 '14 at 16:44
  • Yes it is setup correctly since I am able to copy the files but at some point I am getting the above mentioned error. It is not that everytime I am getting the error only sometimes. – arsenal Jun 11 '14 at 17:56
  • you said that you do not know which host is giving the error, may be you can do a parallel ssh try for 50 times on each host and log the stderr to a file , once parallel completes you can look at log file if it has ssh_exchange error and so on. once we know which host is culprit may be we can look further. – akash Jun 12 '14 at 03:16