3

I'm writing an HTTP parsing library (because I couldn't find a good one in pure D), and I needed to be able to validate IP addresses (for the URI field), so I wrote a couple functions to validate IP addresses:

For IPv4:

bool isIPv4(string addr) {
    int parts;
    ulong idx;

    foreach (i, c; addr) {
        if (c == '.') {
            if (i == 0) {
                return false;
            }

            if (to!int(addr[idx..i]) > 255) {
                return false;
            }

            parts++;
            if (parts > 3) {
                return false;
            }

            idx = i + 1;
        } else if (!isDigit(c)) {
            return false;
        }
    }

    if (to!int(addr[idx..addr.length]) > 255) {
        return false;
    }

    if (parts != 3) {
        return false;
    }

    return true;
}

And for IPv6:

bool isIPv6(string addr) {
    bool isColon, hasSeparator, hasIPv4;
    int leftChunks, rightChunks, digits;

    foreach (i, c; addr) {
        if (isHexDigit(c)) {
            digits = isColon ? 1 : digits + 1;
            isColon = false;

            if (digits == 1) {
                if (hasSeparator) {
                    rightChunks++;
                } else {
                    leftChunks++;
                }
            } else if (digits > 4) {
                return false;
            }
        } else if (c == ':') {
            if (isColon) {
                // multiple :: separators not allowed
                // as is :::
                if (hasSeparator) {
                    return false;
                }
                hasSeparator = true;
            } else {
                isColon = true;
            }
        } else if (c == '.') {
            if (hasSeparator) {
                rightChunks--;
            } else {
                leftChunks--;
            }

            if (!isIPv4(addr[i - digits .. addr.length])) {
                return false;
            }

            hasIPv4 = true;
            break;
        }
    }

    if (hasIPv4) {
        if (hasSeparator) {
            if (rightChunks + leftChunks > 5) {
                return false;
            }
        } else if (leftChunks != 6) {
            return false;
        }
    } else if (digits > 0) {
        if (hasSeparator) {
            if (rightChunks + leftChunks > 7) {
                return false;
            }
        } else if (leftChunks != 8) {
            return false;
        }
    }

    return true;
}

I tried initially to craft a regex for IPv6, but that was painful, especially since there are so many special cases (the ::), and I think I ran into a regex compile bug because it was so long. Obviously, I would like to use some standard function to do this for me.

FWIW, I had the IPv4 validator implemented using std.arrays.split, then I decided to just do it this way, because otherwise I would have to detect or catch exceptions from std.conv.to!int.

Thanks so much!

Note

I would eventually like to try to get some of the code I've written into Phobos, so I would like the code to be as solid as possible.

beatgammit
  • 19,817
  • 19
  • 86
  • 129

4 Answers4

2

How about parseAddress from std.socket?

Vladimir Panteleev
  • 24,651
  • 6
  • 70
  • 114
  • Do you know if it does any DNS requests? It seems like it does for some requests, and I'd like to avoid that. Since I'm writing an HTTP parser, I just want to know what kind of address it is, IPv4, IPv6 or domain name. – beatgammit Feb 23 '12 at 04:46
  • Well, you can see that for yourself here: https://github.com/D-Programming-Language/phobos/blob/master/std/socket.d :) – DejanLekic Feb 23 '12 at 09:55
  • The documentation says: "Does not attempt name resolution." How much clearer can it get? – Vladimir Panteleev Feb 23 '12 at 11:07
  • CyberShadow, perhaps it could be clearer if the documentation's example didn't try to resolve the host name with address.toHostNameString(). Or, if it didn't throw a SocketException when parsing failed and instead threw an InvalidAddressException or something like that. – David Eagen Feb 23 '12 at 15:53
  • Thanks! I must have missed that. I saw the getAddressInfo note there, and I didn't see anywhere that it didn't try DNS resolution. Thakn you so much! – beatgammit Feb 24 '12 at 04:04
1

@tjameson: Long, ago I've hacked my own uri module. Here is the code: http://codepad.org/PBm5BEVP . I always wanted to go back to that module, improve it, and submit a pull request at GitHub, but never had time to do it... The URI RFC also has a regular expression for parsing IPv6 addresses inside URIs, that is definitely something I would put in this code.

DejanLekic
  • 18,787
  • 4
  • 46
  • 77
  • Good point about the regular expression. I think I'll revisit the regular expression route. I tried to make one to cover all the edge cases of IPv6, but it ended up being too long (and when I tried to statically compile it, D's regex engine puked). Did you know there are nine different forms for IPv6? – beatgammit Feb 23 '12 at 02:33
  • Once you get the host using the regexp from the `dl.net.uri` , then you simply call `std.socket.parseUri(host)` . :) – DejanLekic Feb 23 '12 at 09:52
0

You could use the OS provided inet_pton() function. It will parse an address and tell you if it's wrong. See http://www.kernel.org/doc/man-pages/online/pages/man3/inet_pton.3.html

It will parse both IPv4 and IPv6 addresses, and inet_ntop() can be used to convert the parsed address back to its canonical notation.

Sander Steffann
  • 9,509
  • 35
  • 40
  • Is there a cross-platform way to call this? Would it be in std.linux and std.windows? – beatgammit Feb 22 '12 at 22:33
  • `inet_pton` is only available on Windows versions starting with Windows Vista. – Vladimir Panteleev Feb 23 '12 at 02:50
  • I know it is in the standard C library (Single UNIX Spec v3, 2001 to be precise), so it is probably ported to many platforms and languages. I don't know enough about D to point you to the correct lib though. Sorry. Sad to hear that only Win Vista supports it, but logical because older Windows versions had a completely different IP stack where IPv6 was only added afterwards. – Sander Steffann Feb 23 '12 at 07:44
0

Try:

IPv4

/^(((2[0-4]|1\d|[1-9])?\d|25[0-5])(\.(?!$)|$)){4}$

IPv6

/^((?=(?=(.*?::))\2(?!.+::))(::)?([\dA-F]{1,4}:(:|(?!$))|){5}|([\dA-F]{1,4}:){6})((([\dA-F]{1,4}((?!\4)::|:(?!$)|$))|(?!\3\4)){2}|(((2[0-4]|1\d|[1-9])?\d|25[0-5])(\.(?!$)|$)){4})$/i

(Using ECMAscript syntax)

From: http://home.deds.nl/~aeron/regex/

Aeron
  • 39
  • 2