0

I have an IPV6 string as an example below :

"0001:0002:0003:0004:0005:0006:0007:0008"

I am trying to find the cleanest way to convert this into a uint16_t array of 8 values containing 1 - 8 .

I cannot use inet_pton()

I could create my own parsing function,however i'm trying to get sscanf to work as below without success. Any help would be appreciated. Edit : The values will always be represented in Hex format in the string.

char *pString        = "0001:0002:0003:0004:0005:0006:0007:0008";
uint16_t* uintValues = new uint16_t[8]{};
sscanf(pString,"%s%04x:", uintValues );
Engineer999
  • 3,683
  • 6
  • 33
  • 71
  • 1
    Your own parsing function will be the easiest solution. Specially since `::ffff:192.168.0.1` is a valid IPV6 address. Trying to cobble something together from stock C or C++ library functions, and implementing proper error checking, is just asking for a world of pain, and more questions posted to stackoverflow.com. Do your own parsing. If something has to be done right, you have to do it yourself. – Sam Varshavchik Apr 17 '20 at 12:02
  • @SamVarshavchik I forgot to add, all the values will be represented as hex – Engineer999 Apr 17 '20 at 12:05
  • Sure, now try your scanf approach with IPv6 addresses for one of Google's name servers, such as "`2001:4860:4802:34::a`". – Sam Varshavchik Apr 17 '20 at 12:08
  • @SamVarshavchik: That's still fairly easy. Count the number of missing `:`, you should have 7. `2001:4860:4802:34::a` has 5. So insert two: `0:0:0` to get `2001:4860:4802:34: 0:0:0 :a`. Parse this. – MSalters Apr 17 '20 at 12:18
  • I think we should remove the C++ tag. – A M Apr 17 '20 at 12:26
  • 1
    @ArminMontigny oh really? and "new" exists in C code right? – Engineer999 Apr 17 '20 at 12:27
  • ````new```` should be avoided in C++. Also no raw pointers for owned memory. Nowadays smart pointers shall be used. And also not C-Style arrays. And also not ````scanf```` like functions. And for strings, we have ````std::string```` since many many years. The usage of some C++ keywords will not make the code C++. But I already learned here that, if a C++ compiler compiles the code, then it is considered C++. I do not think so. But hey, we are in a free world. Everybody can do what he wants . . . No worries – A M Apr 17 '20 at 12:34
  • @ArminMontigny I disagree. C++ gives us the power to use the language in any way we wish , depending on the requirements, platform etc. Whether it's developing for a pure embedded system where we normally can't use dynamic memory, RTTI etc. we still use the other nice features of C++. It's also possible to write C++ code which interfaces to old libraries written in C, therefore we have no choice but to use raw pointers and other C-style things in some cases. – Engineer999 Apr 17 '20 at 12:40
  • @ArminMontigny _"The usage of some C++ keywords will not make the code C++."_ Of course it does. What else would it be? – Asteroids With Wings Apr 24 '20 at 10:27
  • @AsteroidsWithWings. As I have written above *But hey, we are in a free world. Everybody can do what he wants . . . No worries* You will continue to think that the ````keyword```` new makes aprogramm C++, and I do not think so, if I see ````new,````, ````scanf```` and ````raw pointers```` for owned memory. And since I am too old and badly narrow mindet, it is not worth the effort. No need to start a religious war. I am deeply sorry and do apologize for my low experience in C++. – A M Apr 24 '20 at 11:11

3 Answers3

0
istringstream in("0001:0002:0003:0004:0005:0006:0007:0008");
uint16_t out[8];
char colon;
in >> hex >> out[0] >> colon >> out[1] >> colon >> out[2] >> colon >> out[3] >> colon >> out[4] >> colon >> out[5] >> colon >> out[6] >> colon >> out[7];

No error checking at all which may or may not be a problem.

john
  • 85,011
  • 4
  • 57
  • 81
  • 4
    Ok, now try to parse "::ffff:192.168.0.1", which is a valid IPv6 address. – Sam Varshavchik Apr 17 '20 at 12:05
  • @SamVarshavchik Obviously if that is a possibility then my code isn't going to work. – john Apr 17 '20 at 12:06
  • Neither it will work with "`2001:4860:4802:32::a`", which is one of Google's IPv6 name servers. – Sam Varshavchik Apr 17 '20 at 12:06
  • @SamVarshavchik The OP has clarified their situation, 'all the values will be represented as hex', so I'll let the answer stand. – john Apr 17 '20 at 12:08
  • Sure you can stand, as long as you can explain how it will end up parsing an all-hex, valid IPv6 address such as "`2001:4860:4802:32::a`". Getting the actual, correct, IPv6 address parsed out of that will be a very nice extra bonus. – Sam Varshavchik Apr 17 '20 at 12:09
  • char colon; .. i'm confused – Engineer999 Apr 17 '20 at 12:10
  • I think it is a great idea to actually read the technical specifications for IPv6 addresses, before attempting to write code that will parse them. – Sam Varshavchik Apr 17 '20 at 12:10
  • @Engineer999 That reads in the colons that you have in your input. Actually it would read in any non-white space character. But as I said, no error checking at all. – john Apr 17 '20 at 12:10
  • @SamVarshavchik You seem to be concerned about a problem that the OP doesn't have. I was clear enough in my answer about the limitations of my approach, and you've made them even clearer in the comments. – john Apr 17 '20 at 12:12
  • I guess sscanf should work also, however, i can't get it right – Engineer999 Apr 17 '20 at 12:16
0

Your requirements said you cannot use inet_pton, but you could copy a (BSD-licensed) implementation from FreeBSD:

/* int
 * inet_pton6(src, dst)
 *  convert presentation level address to network order binary form.
 * return:
 *  1 if `src' is a valid [RFC1884 2.2] address, else 0.
 * notice:
 *  (1) does not touch `dst' unless it's returning 1.
 *  (2) :: in a full address is silently ignored.
 * credit:
 *  inspired by Mark Andrews.
 * author:
 *  Paul Vixie, 1996.
 */
static int
inet_pton6(const char *src, u_char *dst)
{
    static const char xdigits_l[] = "0123456789abcdef",
              xdigits_u[] = "0123456789ABCDEF";
#define NS_IN6ADDRSZ    16
#define NS_INT16SZ  2
    u_char tmp[NS_IN6ADDRSZ], *tp, *endp, *colonp;
    const char *xdigits, *curtok;
    int ch, seen_xdigits;
    u_int val;

    memset((tp = tmp), '\0', NS_IN6ADDRSZ);
    endp = tp + NS_IN6ADDRSZ;
    colonp = NULL;
    /* Leading :: requires some special handling. */
    if (*src == ':')
        if (*++src != ':')
            return (0);
    curtok = src;
    seen_xdigits = 0;
    val = 0;
    while ((ch = *src++) != '\0') {
        const char *pch;

        if ((pch = strchr((xdigits = xdigits_l), ch)) == NULL)
            pch = strchr((xdigits = xdigits_u), ch);
        if (pch != NULL) {
            val <<= 4;
            val |= (pch - xdigits);
            if (++seen_xdigits > 4)
                return (0);
            continue;
        }
        if (ch == ':') {
            curtok = src;
            if (!seen_xdigits) {
                if (colonp)
                    return (0);
                colonp = tp;
                continue;
            } else if (*src == '\0') {
                return (0);
            }
            if (tp + NS_INT16SZ > endp)
                return (0);
            *tp++ = (u_char) (val >> 8) & 0xff;
            *tp++ = (u_char) val & 0xff;
            seen_xdigits = 0;
            val = 0;
            continue;
        }
        if (ch == '.' && ((tp + NS_INADDRSZ) <= endp) &&
            inet_pton4(curtok, tp) > 0) {
            tp += NS_INADDRSZ;
            seen_xdigits = 0;
            break;  /*%< '\\0' was seen by inet_pton4(). */
        }
        return (0);
    }
    if (seen_xdigits) {
        if (tp + NS_INT16SZ > endp)
            return (0);
        *tp++ = (u_char) (val >> 8) & 0xff;
        *tp++ = (u_char) val & 0xff;
    }
    if (colonp != NULL) {
        /*
         * Since some memmove()'s erroneously fail to handle
         * overlapping regions, we'll do the shift by hand.
         */
        const int n = tp - colonp;
        int i;

        if (tp == endp)
            return (0);
        for (i = 1; i <= n; i++) {
            endp[- i] = colonp[n - i];
            colonp[n - i] = 0;
        }
        tp = endp;
    }
    if (tp != endp)
        return (0);
    memcpy(dst, tmp, NS_IN6ADDRSZ);
    return (1);
}
Botje
  • 26,269
  • 3
  • 31
  • 41
  • About IPv6: I wonder who is responsible for this mess. The way UTF-8 was designed is much much better... – Antonin GAVREL Apr 17 '20 at 12:27
  • At least it was _specified_ for IPv6. For IPv4 the answer is usually "whatever `inet_aton` accepts", [such as "10.1234567" or "172.16.12345", or "0x7f000001"](https://en.wikipedia.org/wiki/Dot-decimal_notation) – Botje Apr 17 '20 at 12:42
0

I won't give you the answer but the answer to write the code of the parser yourself:


1/ Case insensitive

An example of an IPv6 address is:. 2001:0db8:85a3:0000:0000:8a2e:0370:7334 The hexadecimal digits are case-insensitive, but IETF recommendations suggest the use of lower case letters. The full representation of eight 4-digit groups may be simplified by several techniques, eliminating parts of the representation.


2/ Leading Zeroes

Leading zeroes in a group may be omitted, but each group must retain at least one hexadecimal digit.[1] Thus, the example address may be written as:

2001:db8:85a3:0:0:8a2e:370:7334

3/ 1 to Multiple consecutive groups of 0

One or more consecutive groups containing zeros only may be replaced with a single empty group, using two consecutive colons (::).. The substitution may only be applied once in the address, since multiple occurrences would create an ambiguous representation. Thus, the example address can be further simplified:

2001:db8:85a3::8a2e:370:7334

Bonus: Two Special cases.

The localhost (loopback) address, 0:0:0:0:0:0:0:1, and the IPv6 unspecified address, 0:0:0:0:0:0:0:0, are reduced to ::1 and ::, respectively.

Etc.

Enjoy!!

Antonin GAVREL
  • 9,682
  • 8
  • 54
  • 81
  • Thanks, but how to write the code parser is not what i'm asking for. This is of course a possible solution , but i'm looking to use what's already there. Shouldn't sscanf work? – Engineer999 Apr 17 '20 at 12:24