-3

I'm looking for a quick way to parse human-readable byte sizes (examples: 100, 1k, 2M, 4G) into a byte values. The input is a char * and the output must be a size_t (e.g. unsigned, likely 64-bit or 32-bit integer, depending on architecture). The code should detect invalid input and return an value indicating that it was invalid input.

Examples:

Input  => size_t result
-----------------------
"100"  => 100
"10k"  => 10240
"2M"   => 2097152
"4G"   => 4294967296 on 64-bit machine, error (overflow) on 32-bit machine
"ten"  => error

Here is an example fragment of code to be expanded to handle the unit prefixes:

int parse_human_readable_byte_size(char *input, size_t *result) {
    /* TODO: needs to support k, M, G, etc... */
    return sscanf("%zu", result) == 1;
}

Here are some additional requirements:

  • must be done in C (no C++)
  • use only standard libraries (or at least commonly available) libraries (e.g. sscanf, atoi)

The code is expected to run only a few times per program execution, so smaller readable code is favored over longer higher-performance code.

ɲeuroburɳ
  • 6,990
  • 3
  • 24
  • 22
  • Doesn't seem like a sensible solution but maybe you can try `size_t array_t[2]` and split the big number into two halves – Suvarna Pattayil Apr 19 '13 at 15:14
  • Hi SuvP, didn't mean to imply that my code snippet above was a working solution--only to provide an example of what the code needs to do. The actual solution can come in any (similar) form. – ɲeuroburɳ Apr 19 '13 at 15:18
  • I meant what I am suggesting doesn't comes across a sensible solution – Suvarna Pattayil Apr 19 '13 at 15:35

3 Answers3

4

Here is a potential implementation. Code to detect all errors is included; fill in your own handling in place of the gotos if you like.

char *endp = s;
int sh;
errno = 0;
uintmax_t x = strtoumax(s, &endp, 10);
if (errno || endp == s) goto error;
switch(*endp) {
case 'k': sh=10; break;
case 'M': sh=20; break;
case 'G': sh=30; break;
case 0: sh=0; break;
default: goto error;
}
if (x > SIZE_MAX>>sh) goto error;
x <<= sh;
R.. GitHub STOP HELPING ICE
  • 208,859
  • 35
  • 376
  • 711
  • 1
    Very nice. I think it needs one small addition after the switch `if (sh && endp[1]) goto error;` to handle input with extra characters at end. – ɲeuroburɳ Apr 19 '13 at 15:48
0

I'll try with a sub-function that analyzes the input char by char.

Further an obvious error check, I'll make it translate symbols in numeric constant, multiplied for the base corresponding to the constant.

FdT
  • 145
  • 1
  • 6
0

Based on accepted answer, I updated the snipped. It support float input (like 1.5k), support hexadecimal input (like 0x55k), drop gotos and use a string as list of units to avoid the switch and makes the update easy.

static char *human_readable_suffix = "kMGT";

size_t *parse_human_readable(char *input, size_t *target) {
    char *endp = input;
    char *match = NULL;
    size_t shift = 0;
    errno = 0;

    long double value = strtold(input, &endp);
    if(errno || endp == input || value < 0)
        return NULL;

    if(!(match = strchr(human_readable_suffix, *endp)))
        return NULL;

    if(*match)
        shift = (match - human_readable_suffix + 1) * 10;

    *target = value * (1LU << shift);

    return target;
}

Here are the tests result:

1337   =>           1337 [ok, expected: 1337]
857.54 =>            857 [ok, expected: 857]
128k   =>         131072 [ok, expected: 131072]
1.5k   =>           1536 [ok, expected: 1536]
8M     =>        8388608 [ok, expected: 8388608]
0x55   =>             85 [ok, expected: 85]
0x55k  =>          87040 [ok, expected: 87040]
1T     =>  1099511627776 [ok, expected: 1099511627776]
32.    =>             32 [ok, expected: 32]
-87    => error (expected)
abcd   => error (expected)
32x    => error (expected)

Full code can be found at: https://gist.github.com/maxux/786a9b8bf55fb0696f7e31b8fa3f6b9d

Maxux
  • 199
  • 1
  • 1
  • 11