Parallel Iterating IP Addresses in Bash

Question

I'm dealing with a large private /8 network and need to enumerate all webservers which are listening on port 443 and have a specific version stated in their HTTP HEADER response.

First I was thinking to run nmap with connect scans and grep myself through the output files, but this turned out to throw many false-positives where nmap stated a port to be "filtered" while it actually was "open" (used connect scans: nmap -sT -sV -Pn -n -oA foo 10.0.0.0/8 -p 443).

So now I was thinking to script something with bash and curl - pseudo code would be like:

for each IP in 10.0.0.0/8  
do:  
    curl --head https://{IP}:443 | grep -iE "(Server\:\ Target)" > {IP}_info.txt;  
done

As I'm not that familiar with bash I'm not sure how to script this properly - I would have to:

loop through all IPs
make sure that only X threats run in parallel
ideally cut the output to only note down the IP of the matching host in one single file
ideally make sure that only matching server versions are noted down

Any suggestion or pointing into a direction is highly appreciated.

score 8 · Accepted Answer · edited May 23 '17 at 12:34

Small Scale - iterate

for a smaller IP address span it would probably be recommended to iterate like this:

for ip in 192.168.1.{1..10}; do ...

As stated in this similar question.

Big Scale - parallel !

Given that your problem deals with a huge IP address span you should probably consider a different approach.

This begs for the use of gnu parallel.

Parallel iterating a big span of IP addresses in bash using gnu parallel requires splitting the logic to several files (for the parallel command to use).

ip2int

#!/bin/bash

set -e

function ip_to_int()
{
  local IP="$1"
  local A=$(echo $IP | cut -d. -f1)
  local B=$(echo $IP | cut -d. -f2)
  local C=$(echo $IP | cut -d. -f3)
  local D=$(echo $IP | cut -d. -f4)
  local INT

  INT=$(expr 256 "*" 256 "*" 256 "*" $A)
  INT=$(expr 256 "*" 256 "*" $B + $INT)
  INT=$(expr 256 "*" $C + $INT)
  INT=$(expr $D + $INT)

  echo $INT
}

function int_to_ip()
{
  local INT="$1"

  local D=$(expr $INT % 256)
  local C=$(expr '(' $INT - $D ')' / 256 % 256)
  local B=$(expr '(' $INT - $C - $D ')' / 65536 % 256)
  local A=$(expr '(' $INT - $B - $C - $D ')' / 16777216 % 256)

  echo "$A.$B.$C.$D"
}

scan_ip

#!/bin/bash

set -e

source ip2int

if [[ $# -ne 1 ]]; then
    echo "Usage: $(basename "$0") ip_address_number"
    exit 1
fi

CONNECT_TIMEOUT=2 # in seconds
IP_ADDRESS="$(int_to_ip ${1})"

set +e
data=$(curl --head -vs -m ${CONNECT_TIMEOUT} https://${IP_ADDRESS}:443 2>&1)
exit_code="$?"
data=$(echo -e "${data}" | grep "Server: ")
     # wasn't sure what are you looking for in your servers
set -e

if [[ ${exit_code} -eq 0 ]]; then
    if [[ -n "${data}" ]]; then
        echo "${IP_ADDRESS} - ${data}"
    else
        echo "${IP_ADDRESS} - Got empty data for server!"
    fi
else
    echo "${IP_ADDRESS} - no server."
fi

scan_range

#!/bin/bash

set -e

source ip2int

START_ADDRESS="10.0.0.0"
NUM_OF_ADDRESSES="16777216" # 256 * 256 * 256

start_address_num=$(ip_to_int ${START_ADDRESS})
end_address_num=$(( start_address_num + NUM_OF_ADDRESSES ))

seq ${start_address_num} ${end_address_num} | parallel -P0 ./scan_ip

# This parallel call does the same as this:
#
# for ip_num in $(seq ${start_address_num} ${end_address_num}); do
#     ./scan_ip ${ip_num}
# done
#
# only a LOT faster!

Improvement from the iterative approach:

The run time of the naive for loop (which is estimated to take 200 days for 256*256*256 addresses) was improved to under a day according to @skrskrskr.

Thank you, that really got me going. Much appreciated!!! I took that snipped, modified it a little and ran it via `seq 16777216 | parallel -P0 ./myscript.sh > output.txt` My machine is able to run slightly over 500 jobs in parallel and the runtime is super fast now! Running the script with the for-loop had an estimated runtime around 200 days. With parallels I am now down to a runtime of under one day. Perfect! — skrskrskr, Aug 21 '14 at 12:52
Sustaining 500 with 2 seconds tops for each (for timeout) means 67,108 seconds which are 18.6 hours. The more servers to answer before the timeout, the less time it will take. That's without considering that parallel doesn't work in **chunks** of 500, but runs a job once one is over. — ArnonZ, Aug 22 '14 at 13:21
@ArnonZilca True, but if the machine can sustain 30000 jobs with no sweat then the task will go from taking a full day to taking the lunch break. — Ole Tange, Aug 24 '14 at 00:00
Didn't say otherwise. He did write though that his machine can run slightly over 500 jobs. — ArnonZ, Aug 24 '14 at 06:48

Ole Tange · Answer 2 · 2014-08-27T09:40:59.970

Shorter:

mycurl() {
    curl --head https://${1}:443 | grep -iE "(Server\:\ Target)" > ${1}_info.txt;  
}
export -f mycurl
parallel -j0 --tag mycurl {1}.{2}.{3}.{4} ::: {10..10} ::: {0..255} ::: {0..255} ::: {0..255}

Slightly different using --tag instead of many _info.txt-files:

parallel -j0 --tag curl --head https://{1}.{2}.{3}.{4}:443 ::: {10..10} ::: {0..255} ::: {0..255} ::: {0..255} | grep -iE "(Server\:\ Target)" > info.txt

Fan out to run more than 500 in parallel:

parallel echo {1}.{2}.{3}.{4} ::: {10..10} ::: {0..255} ::: {0..255} ::: {0..255} | \
  parallel -j100 --pipe -N1000 --load 100% --delay 1 parallel -j250 --tag -I ,,,, curl --head https://,,,,:443 | grep -iE "(Server\:\ Target)" > info.txt

This will spawn up to 100*250 jobs, but will try to find the optimal number of jobs where there is no idle time for any of the CPUs. On my 8 core system that is 7500. Make sure you have RAM enough to run the potential max (25000 in this case).