1

I have a large file that contains 2 IPs per line - and there's about 3 million lines total.

Here's an example of the file:

1.32.0.0,1.32.255.255
5.72.0.0,5.75.255.255
5.180.0.0,5.183.255.255
222.127.228.22,222.127.228.23
222.127.228.24,222.127.228.24

I need to convert each IP to an IP Decimal, like this:

18874368,18939903
88604672,88866815
95682560,95944703
3732923414,3732923415
3732923416,3732923416

I'd prefer a way to do this strictly via command line. I'm okay with perl or python being used, as long as it doesn't require extra modules to be installed.

I thought I had come across a way that someone converted IPs like this using sed but can't seem to find that tutorial anymore. Any help would be appreciated.

Senjuti Mahapatra
  • 2,570
  • 4
  • 27
  • 38

4 Answers4

3

If you have gnu awk installed (for the RT variable), you could use this one-liner:

awk -F. -v RS='[\n,]' '{printf "%d%s", (($1*256+$2)*256+$3)*256+$4, RT}' file
18874368,18939903
88604672,88866815
95682560,95944703
3732923414,3732923415
3732923416,3732923416
user000001
  • 32,226
  • 12
  • 81
  • 108
  • It's ok, and has better performance, but it displays IP in scientific format. – Marek Nowaczyk Dec 08 '16 at 08:58
  • @MarekNowaczyk: That's weird, on my system the output is like above (I copy pasted from the terminal). My awk is GNU Awk 4.1.3. What version are you using? Try with `"%d%s"` as a format string... – user000001 Dec 08 '16 at 09:18
2

Here it is python solution, that use only standard modules (re, sys):

import re
import sys

def multiplier_generator():
    """  Cyclic generator of powers of 256 (from 256**3 down to 256**0)
        The mulitpliers tupple could be replaced by inline calculation
    of power, but this approach has better performance.
    """ 
    multipliers = (
        256**3,
        256**2,
        256**1,
        256**0,
    )
    idx = 0
    while 1 == 1:
        yield multipliers[idx]
        idx = (idx + 1) % 4

def replacer(match_object):
    """re.sub replacer for ip group"""
    multiplier = multiplier_generator()
    res = 0
    for i in xrange(1,5):
        res += multiplier.next()*int(match_object.group(i))
    return str(res)

if __name__ == "__main__":
    std_in = ""
    if len(sys.argv) > 1:
        with open(sys.argv[1],'r') as f:
            std_in = f.read()
    else:
        std_in = sys.stdin.read()
    print re.sub(r"([0-9]+)\.([0-9]+)\.([0-9]+)\.([0-9]+)", replacer, std_in )

This solution replace every ip address, that can be found in text from standard input or from file passed as first parameter, i.e:

  • python convert.py < input_file.txt, or
  • python convert.py file.txt, or
  • echo "1.2.3.4, 5.6.7.8" | python convert.py.
Marek Nowaczyk
  • 257
  • 1
  • 5
  • Thanks for your answer! Where in your script would I specify that I need this to be run on `test.txt`? – Frank Sinclair Dec 08 '16 at 08:17
  • It does exactly what I want, but something seems to be off with the math. If you try running my sample IPs above through your script, it's giving different IP Decimals :/ – Frank Sinclair Dec 08 '16 at 08:26
2

With bash:

ip2dec() {
  set -- ${1//./ }     # split $1 with "." to $1 $2 $3 $4
  declare -i dec       # set integer attribute
  dec=$1*256*256*256+$2*256*256+$3*256+$4
  echo -n $dec
}

while IFS=, read -r a b; do ip2dec $a; echo -n ,; ip2dec $b; echo; done < file

Output:

18874368,18939903
88604672,88866815
95682560,95944703
3732923414,3732923415
3732923416,3732923416
Cyrus
  • 84,225
  • 14
  • 89
  • 153
1

With bash and using shift (one CPU instruction) instead of multiply (a lot of instructions):

ip2dec() {  local IFS=.
            set -- $1     # split $1 with "." to $1 $2 $3 $4
            printf '%s' "$(($1<<24+$2<<16+$3<<8+$4))"
         }

while IFS=, read -r a b; do
    printf '%s,%s\n' "$(ip2dec $a)" "$(ip2dec $b)"
done < file