9

I have a file which stores values like 2.32x7. I read the floating-point part using:

fscanf(file, "%lf", &value);

It works perfectly... except for when the file stores something like 0x2. In that case, it reads the entire string as a hexadecimal value.

How can I prevent this from happening? I would like fscanf to just read 0 and ignore x2.

Edit: As suggested by @dbush, I am adding a sample input file.

I am parsing polynomials. So, the input file will be something like:

0x2+2.32x7-4x-9

Runnable example.

Qwertiy
  • 19,681
  • 15
  • 61
  • 128
Off Kilter
  • 201
  • 1
  • 4
  • 2
    Tough one! `fscanf("%lf", ...)` must behave like `strtod()`, and [`strtod()` (C11 7.22.1.3)](https://port70.net/~nsz/c/c11/n1570.html#7.22.1.3) has to accept `"0x2"` as a valid input. Your best bet is to parse manually, re-writing the `scanf()` (or `strtod()`) code without the `"x"` part (maybe read into a string, replace the `'x'` with something else, then `sscanf()` from there). *Maybe change the original file to `"2.37*7"`?* – pmg Mar 05 '22 at 16:13
  • Please post a sample input file that demonstrates some of the problem cases you're worried about as well as how you would expect that file to be read. That will give us a better idea as to how it should be parsed. – dbush Mar 05 '22 at 16:22
  • 1
    your best best is probably to tokenize the string and not use fscanf. – AndersK Mar 05 '22 at 16:52
  • @AndersK Yes. But I have already written the entire project and changing it is going to be a pain... I am hoping for an easy fix. If there is no better way, I am going to handle the case where the number starts with 0 separately... I will keep your advice in mind the next time I write something like this. – Off Kilter Mar 05 '22 at 17:14
  • 6
    @OffKilter Frankly, if you keep using `fscanf()` this probably won't be the last surprise you get from how it parses input. The entire `*scanf()` family is perverse, at best. And **when** it goes off the rails again, your input stream is left in an unknown state. – Andrew Henle Mar 05 '22 at 17:59
  • 3
    The standard advice for `scanf()` applies equally to `fscanf()`... **Use exclusively `fgets()` for user input; forget `scanf()` (or `fscanf()`) exists**. – pmg Mar 05 '22 at 18:03
  • 4
    Note that `strtod()` only accepts `0x2` as the start of a number when it is followed by a binary exponent `0x2P3` or something similar. However, `fscanf()` et al probably only use one character of pushback, so there isn't much it can do when it finds the `+` after the `0x2` except refuse to make a conversion. Basically, you'll need to rewrite the code that parses the input differently. I'd expect to read a (long) line of input and then parse it somewhat ad hoc. Since you're not interested in hex, you simply don't have to recognize it. Why would someone enter a term `0x2` in your polynomial? – Jonathan Leffler Mar 05 '22 at 18:07
  • 3
    Oh, Ugh!! Further down the specification of `strtod()` are the weasel words: _or if a binary exponent part does not appear in a hexadecimal floating point number, an exponent part of the appropriate type with value zero is assumed to follow the last digit in the string._ It has to accept `0x2`as `0x2P0`, it seems. You'll have to sanitize the strings given to `strtod()`. You might be able to do something like 'size_t num_char = strspn(input, "+-.0123456789" if you're only accepting fixed point decimals and not exponential notation. Add `eE` to the string if you accept exponential notation too. – Jonathan Leffler Mar 05 '22 at 18:12
  • 1
    Or you may simply look for characters up to the letter `x` if your polynomials are always in terms of `x` and not `y`, etc. As other people also said, the `scanf()` family of functions is not appropriate for parsing your input — you will have to use some other techniques (non-standard functions, probably of your own devising). – Jonathan Leffler Mar 05 '22 at 18:13
  • Best would be fgets and tokenizing as mentioned above, but you could also think about a drop-in replacement for using fscanf that just collects the allowed characters and then applies sscanf to them @JonathanLeffler? Reading an `x` would then result in an `ungetc(ch, file)`. And the OP would not need to rewrite his solution. There would be no additional drawbacks - would that work? – Stephan Schlecht Mar 05 '22 at 18:29
  • 2
    @StephanSchlecht — something along those general lines is what I have in mind. I'd be tempted to use `strtod()` rather than `sscanf()` to process the coefficients, and `strtol()` to process the exponents (I assume fractional exponents are not allowed in the polynomials). I would definitely read whole lines — `fgets()` or POSIX `getline()` — and parse them. The OP would need to show their current code before we can help much more. – Jonathan Leffler Mar 05 '22 at 18:32
  • I'm pretty new to c, so all your advices are very appreciated. I was aware that scanf() has a bad reputation, but I didn't know that it is this bad. I am already using fgetc() and ungetc() to parse the polynomial. Since it appears that there is no nice fix which does not involve me rewriting a significant part of this project, I am going to simply consume any leading zero and call it a day. – Off Kilter Mar 05 '22 at 19:05
  • @OffKilter "stores values like 2.32x7" --> is that the same as `23200000.0` or `2.32*2.32* 2.32* 2.32*2.32*2.32* 2.32` or what? – chux - Reinstate Monica Apr 01 '22 at 11:26
  • @chux-ReinstateMonica — it's a term of a polynomial in x: 2.32 x⁷ – Jonathan Leffler Apr 01 '22 at 11:29
  • @JonathanLeffler "...fscanf() et al probably only use one character of pushback..." -- They are *required by the standard* to use only that and no more, with all the funny differences between how `*scanf` parses and `strto*` parses which that entails. – DevSolar Apr 01 '22 at 11:42
  • @OffKilter The *best* idea, of course, would be to arrange the input to have proper whitespacing. `0x2+2.32x7-4x-9` is *several* kinds of ambiguous... – DevSolar Apr 01 '22 at 12:00
  • Why would you enter a term in the polynomial where the coefficient is zero? That seems perverse. If the coefficient is zero, you can drop that term, unless your code requires each power to be represented (so x3+4 would be represented internally as 1x3+0x2+0x1+4). – Jonathan Leffler Apr 01 '22 at 12:26
  • `fscanf` is not for such syntax analyzers. You have to create little more complex program and not expect to get result with 1 function. – i486 Apr 01 '22 at 12:32

3 Answers3

1

Reading the line with fgets() and then parsing with crafted code is the most robust.


To read text as a single floating point number with fscanf() up to an 'x', first read with a scanset and then convert.

char buf[400 + 1];  // Something large - consider DBL_MAX may be 1e308
// Scanset of expected FP characters
#define FP_FMT " %400[-+.eE0-9]"
// or maybe simply
#define FP_FMT " %400[^x]"

if (fscanf(FP_FMT, buf) == 1 && sscanf(buf, "%lf", &value) == 1) {
  // Success

Pedantic code would use strtod() instead of sscanf(buf, "%lf", &value).

Other consideration include locale use of ',' as the decimal point, NAN, infinity, even wider text as wee exact FP values, how to handle ill formatted text, input errors, EOF, ...


Consider scanning the pair of value and exponent in 1 step.

if (fscanf(FP_FMT "x%d, buf, &power) == 2 && sscanf(buf, "%lf", &value) == 1) {
chux - Reinstate Monica
  • 143,097
  • 13
  • 135
  • 256
1

For your purpose, you should read the full polynomial with fgets() and parse it with as hoc code using strtod() and strtol():

#include <errno.h>
#include <stdlib.h>

int parse_polynomial(struct polynomial *p, const char *s) {
    for (;;) {
        double factor = 1;
        int exponent = 0;
        char *p;
        while (isspace((unsigned char)*s) {
            s++;
        }
        if (*s == '\0')
            break;
        if (*s == '+') {
            s++;
        } else
        if (*s == '-') {
            factor = -1;
        }
        if (*s != 'x') {
            errno = 0;
            factor *= strtod(s, &p);
            if (p == s || errno != 0) {
                /* parse error */
                break;
            }
            s = p;
        }
        if (*s == 'x') {
            exponent = 1;
            s += 1;
            if (isdigit((unsigned char)*s) {
                unsigned long ul;
                errno = 0;
                ul = strtoul(s, &p, 10);
                if (p == s || errno != 0 || ul > INT_MAX)
                    break;
                exponent = (int)ul;
                s = p;
            }
        }
        add_component(p, factor, exponent);
    }
    return (*s == '\0') ? 0 : -1;
}
chqrlie
  • 131,814
  • 10
  • 121
  • 189
0

❗❗❗ Incorrectly handles numbers with dots...

Thinking, how to fix that.

You can read the numeric characters, then parse number from the string: example

#include <stdio.h>

int main()
{
  double value;
  char s[32];

  *s = 0;
  scanf("%31[0-9]", s);
  sscanf(s, "%lf", &value);

  printf("%f\n", value);

  scanf("%3s", s);
  puts(s);
}

If you need negative numbers too: example

#include <stdio.h>

int main()
{
  double value;
  char s[32];

  *s = 0;
  scanf("%1[-+]", s);
  scanf("%30[0-9]", s+!!*s);
  sscanf(s, "%lf", &value);

  printf("%f\n", value);

  scanf("%3s", s);
  puts(s);
}

Note that the last code eats the sign even if it's not followed by digits.

Qwertiy
  • 19,681
  • 15
  • 61
  • 128
  • 1
    Note that there's an off-by-one buffer overflow error in `char s[16]; … scanf("%16[0-9]", s);` — for the given character array, you need to specify `15` (not `16`) in the format string. – Jonathan Leffler Apr 01 '22 at 11:32
  • @JonathanLeffler, ooops, fixed. Also have to substract 2 in the second version. – Qwertiy Apr 01 '22 at 11:54
  • @chux-ReinstateMonica, yep, forgot to substract place for zero and sign. Fixed it. I think, concrete size is up to the author as only he knows format. – Qwertiy Apr 01 '22 at 11:56
  • @chux-ReinstateMonica, forgot to delete this line when editing original code. Second read always failed without changing the variable, so missed it. Removed now. – Qwertiy Apr 01 '22 at 12:17
  • Consider how OP's "2.32x7" would work or not. Maybe add `'.'` to the scanset? – chux - Reinstate Monica Apr 01 '22 at 12:19
  • @chux-ReinstateMonica, have to think about it. If add it to format. it could consume multiple dots and longer sequence then it should, – Qwertiy Apr 01 '22 at 12:43