1

I have a file format like this

1.9969199999999998  2.4613199999999997  130.81278270000001  AA
2.4613199999999997  2.5541999999999998  138.59131554109211  BB
2.5541999999999998  2.9953799999999995  146.83238401449094  CC
...........................

I have to read first three columns as float and the last column as char array in C. All the columns are tab separated and the there is an new line character at the end of each line. Everything works fine with fscanf(fp1, "%f\t%f\t%f\t%s\n", ...) till I have a some text at the end of each line (the char string part).

There are cases where instead of AA/BB/CC, I have an empty string in the file. How to handle that case. I have tried fscanf(fp1, "%f\t%f\t%f\t%s[^\n]\n", ...) and many other things, but I am unable to figure out the right way. Can you please help me out here?

tadman
  • 208,517
  • 23
  • 234
  • 262
Swapnil
  • 1,870
  • 2
  • 23
  • 48
  • 2
    Don't say "Doesn't work", it's really, *really* irritating. Instead say what it *does* do that you don't want it to, or more specifically, what it isn't doing that you *do* want it to. – tadman Jan 23 '18 at 16:58
  • 2
    Using `float` rather than `double` will throw away half the digits shown. You get 6-7 decimal digits with `float`; you get 15+ digits with `double`. – Jonathan Leffler Jan 23 '18 at 17:07
  • 5
    As to your main question: use `fgets()` to read lines and then `sscanf()` to parse the line that is read. This will avoid confusion. When the input is line-based but not regular enough, don't use `fscanf()` and family to read the data — they file-reading `scanf()` functions don't care about newlines, even when you do. (Note that `sscanf()` will return either 3 or 4, indicating whether there was a string at the end of a line or not. Always test the return value from `scanf()` and friends — but do so carefully.) – Jonathan Leffler Jan 23 '18 at 17:08
  • 1
    Me, I would not use any of the *scanf family to solve this sort of problem. I would read whole lines using `fgets`, then split each line up into tab-separated fields. Splitting up a line into fields (based on delimiter(s)) is straightforward -- you can use `strtok` or `strsep` for this. (Or see https://www.eskimo.com/~scs/cclass/notes/sx10h.html .) – Steve Summit Jan 23 '18 at 17:25
  • 1
    @tadman Sorry for the wrong choice of words. Edited the question! Thanks for pointing out :) – Swapnil Jan 24 '18 at 04:59
  • @JonathanLeffler Thank you for the help. It worked for me. If you can put it as an answer, I can mark it as correct. Thank you. – Swapnil Jan 24 '18 at 06:13

1 Answers1

2

Using float rather than double will throw away about half the digits shown. You get 6-7 decimal digits with float; you get 15+ digits with double.

As to your main question: use fgets() (or POSIX getline()) to read lines and then sscanf() to parse the line that is read. This will avoid confusion. When the input is line-based but not regular enough, don't use fscanf() and family to read the data — the file-reading scanf() functions don't care about newlines, even when you do.

Note that sscanf() will return either 3 or 4, indicating whether there was a string at the end of a line or not (or EOF, 0, 1 or 2 if it is given an empty string, or a string which doesn't start with a number, or a string which only contains one or two numbers). Always test the return value from scanf() and friends — but do so carefully. Look for the number of values that you expect (3 or 4 in this example), rather than 'not EOF'.

This leads to roughly:

#include <stdio.h>

int main(void)
{
    double d[3];
    char text[20];
    char line[4096];

    while (fgets(line, sizeof(line), stdin) != 0)
    {
        int rc = sscanf(line, "%lf %lf %lf %19s", &d[0], &d[1], &d[2], &text[0]);
        if (rc == 4)
            printf("%13.6f  %13.6f  %13.6f  [%s]\n", d[0], d[1], d[2], text);
        else if (rc == 3)
            printf("%13.6f  %13.6f  %13.6f  -NA-\n", d[0], d[1], d[2]);
        else
            printf("Format error: return code %d\n", rc);
    }
    return 0;
}

If given this file as standard input:

1.9969199999999998  2.4613199999999997  130.81278270000001  AA
2.4613199999999997  2.5541999999999998  138.59131554109211  BB
2.5541999999999998  2.9953799999999995  146.83238401449094  CC
19.20212223242525  29.3031323334353637 3940.41424344454647
19.20212223242525  29.3031323334353637 3940.41424344454647  PolyVinyl-PolySaccharide

the output is:

 1.996920       2.461320     130.812783  [AA]
 2.461320       2.554200     138.591316  [BB]
 2.554200       2.995380     146.832384  [CC]
19.202122      29.303132    3940.414243  -NA-
19.202122      29.303132    3940.414243  [PolyVinyl-PolySacch]

You can tweak the output format to suit yourself. Note that the %19s avoids buffer overflow even when the text is longer than 19 characters.

Jonathan Leffler
  • 730,956
  • 141
  • 904
  • 1,278