13

I'm trying to use fscanf to read in data, and part of the input is a float followed by the letter 'e', for example, 41.72elapsed. When writing the strng for fscanf, I attempted to use "%felapsed", but this doesn't work, as %fe is its own format specifier. How would I read this in using fscanf?

edit: Here is the code:

#include <stdio.h>
#include <string.h>
#include <stdlib.h>

#define CHAR_MAX 1024

int main(int argc, char **argv)
{
    FILE *file_in = fopen(argv[1], "r+");
    char out_name[CHAR_MAX];
    strcpy(out_name, argv[1]);
    strcat(out_name, ".csv");
    FILE *csv_out = fopen(out_name, "w");
    int minutes;
    float seconds;
    fprintf(csv_out, "Trial #, Execution Time\n");

    for (int i = 0; fscanf(file_in, "%*fuser %*fsystem %d:%felapsed %*d%%CPU (%*davgtest+%*davgdata %*dmaxresident)k\n%*dinputs+%*doutputs (%*dmajor+%*dminor)pagefaults %*dswaps\n", &minutes, &seconds) == 2; i++) {
         fprintf(csv_out, "%d, %d:%.2f\n", i, minutes, seconds);
     };
    return 0;
}

Here is some sample input:

283.97user 0.69system 1:13.77elapsed 385%CPU (0avgtext+0avgdata 107472maxresident)k

0inputs+4616outputs (0major+9550minor)pagefaults 0swaps

287.87user 0.35system 1:14.41elapsed 387%CPU (0avgtext+0avgdata 107328maxresident)k

0inputs+4616outputs (0major+9524minor)pagefaults 0swaps
user2093696
  • 131
  • 5
  • "*`%fe` is its own format specifier.*" is it? What would it expect though? – alk Mar 18 '16 at 14:15
  • 1
    Yes, it is. the e specifies scientific notation with exponent. The reason for my question is that, having actually tested this, using %fe does not work for that reason. – user2093696 Mar 18 '16 at 14:21
  • "*the e specifies scientific notation*" no it doesn't. Are you perhaps referring to `%e`, this latter is equivalent to `%f`. Conversion specifiers are not suffixed. Also there is not length modifier `f`. – alk Mar 18 '16 at 14:23
  • 1
    To avoid misunderstandings, you better provide some code, the exact input, as well as the exact expected result. – alk Mar 18 '16 at 14:43
  • It's going to be very hard -- I suspect more trouble than it's worth -- to parse this with `scanf`. That 'e' is always going to give you trouble. You might want to try a regex package instead. – Steve Summit Mar 18 '16 at 14:57
  • @alk, the problem could be OP's scanf() sees `"13.77e"` and a FP number when it hopefully would unget the `e` as it is not followed by a numbers. (exponentiation) – chux - Reinstate Monica Mar 18 '16 at 14:58
  • By the way, I'd try to fix that data you must deal with as well. Appears to be like some half-assed csv conversion. People should produce appropriately formatted data, which means to use separators which cannot appear in the data, or escape them if they do, etc. – Peter - Reinstate Monica Mar 18 '16 at 15:44
  • @PeterA.Schneider I had the same thought, but it appears to be the standard sort of line printed by the time(1) command (the one in /bin, not the shell builtin). – Steve Summit Mar 20 '16 at 13:30
  • @user2093696 I think we've figured out your parsing problem, but as the situation demonstrates pretty well, these "41.72elapsed" strings were really not designed to be parsed by machine. If what you're trying to do is audit the elapsed time and other resources consumed by a series of tests, a completely different approach would be to use the `getrusage` system call. Alternatively you might be able to find the information you need in `/proc`. – Steve Summit Mar 20 '16 at 13:34
  • @SteveSummit Interesting. Who in their right mind would do that? The output of `time` is easily improved though with a format string -- e.g. `/usr/bin/time -f "%U user %S system %E elapsed %P CPU (%X text + %D data %M max)" ls` (the default format with spaces inserted at strategic places) yields the output `0.00 user 0.00 system 0:00.00 elapsed 0% CPU (0 text + 0 data 848 max)` which should eliminate the parsing problems. – Peter - Reinstate Monica Apr 04 '16 at 14:21
  • @PeterA.Schneider I did not know that `time` had joined the family of programs accepting fmt arguments! Thanks for the tip. – Steve Summit Apr 04 '16 at 15:44

4 Answers4

6

This is a problem with scanf()

FP formats like "%f" see the e as introducing exponentiation. Since the e is not followed by a number, scanning for the float stops. But scanf() has all ready scanned one past the e and C does not require for scanf() to be able to back up more than 1 character. So code is out-of-luck using a simple "%f".

Some systems will back up more than 1 character, but C does not require that capability.

Code needs a new approach - scan in seconds as a string

char sec[20];
int cnt = fscanf(file_in, "%d:%19[0-9. ]elapsed", &minutes, sec);
if (cnt == 2) {
  seconds = atof(sec); 
  ...
}
Steve Summit
  • 45,437
  • 7
  • 70
  • 103
chux - Reinstate Monica
  • 143,097
  • 13
  • 135
  • 256
  • That's too bad. Anyway, thanks for the advice, I''l try out your solution. – user2093696 Mar 18 '16 at 15:13
  • @user2093696 Note: The C spec has a footnote "fscanf pushes back at most one input character onto the input stream." – chux - Reinstate Monica Mar 18 '16 at 15:17
  • You can also see that this behavior is required by the standard in the discussion of "input items" (7.19.6.2.9 in n1256). "An input item is defined as the longest sequence of input characters which does not exceed any specified field width and which is, or is a prefix of, a matching input sequence. The first character, if any, after the input item remains unread." So it looks to me like even if an implementation was able to push back more characters, it wouldn't be allowed to. It makes sense that you'd want deterministic behavior, even if it isn't ideal. – Nate Eldredge Mar 18 '16 at 15:31
  • BTW: Since it is unclear if the seconds might be printed with a leading space as in `" 1.23"`, updated my answer. – chux - Reinstate Monica Mar 18 '16 at 16:14
4

There's simply no need for the "elapsed" in your format. The scanf family of function will read as long as they can from the input, and for floating-point number it will stop reading when it hits a non-digit character, i.e. the e in elapsed.

So the format only needs to be "%f", and that's it. I.e.

float value;
fscanf(file, "%f", &value);

If you want to read and discard the elapsed part, use "%*s", the asterisk tells scanf (and family) to read and ignore the rest, so the full call would look like

float value;
fscanf(file, "%f%*s", &value);

After seeing your code, it can be much simpler and easier with something like

char input[512];
for (int i = 0; fgets(input, sizeof input, file_in) != NULL; ++i) {
    if (sscanf(input, "%*f%*s %*f%*s %d:%f%*s", &minutes, &seconds) == 2) {
        fprintf(csv_out, "%d, %d:%.2f\n", i, minutes, seconds);
    }
}

Since this loop uses fgets instead of direct fscanf you will read all lines in the input file, not only just the first one. Also since fgets is used we don't need the sscanf function to actually parse the parts of the string we don't need (which is most of it), instead we only have sscanf parse the input string until we have the data we need.

Some programmer dude
  • 400,186
  • 35
  • 402
  • 621
  • 2
    Joachim, it seems his implementation/compiler/library is consuming the 'e' as the beginning of scientific notation. Then reading on fails but it doesn't push back the 'e'. – Paul Ogilvie Mar 18 '16 at 14:49
  • Yes, that is what appears to be happening based on what I can see. – user2093696 Mar 18 '16 at 14:51
  • Yes, the `*`, the "*assignment-suppression character*" should do the job. :-) – alk Mar 18 '16 at 14:53
  • How do you mean? As in, %*felapsed? Because I need the value read in, I can't discard it. – user2093696 Mar 18 '16 at 15:01
  • @user2093696 Updated my code with an alternative to your loop. Since you don't actually extract anything from after the `"elapsed"` part, you don't actually need `fscanf` to parse it – Some programmer dude Mar 18 '16 at 15:15
2

This is a bit of a hack and may be too brittle, but:

The float you want to parse seems to be a time in minute.second format, with positive integers. If the producer of the data reliably pads small numbers with zero (e.g. 1:02.03), you can simply use a fixed field length of 5, because seconds and minutes will never be larger than 59 and thus always be two characters wide each:

sscanf("12.345678", "%5f%s, &f, buf)
will read 12.34 into f and 5678 into buf. (The same, of course, with "12.34elapsed". I just wanted to make it unmistakably clear that only 5 characters of the input are consumed.)
Peter - Reinstate Monica
  • 15,048
  • 4
  • 37
  • 62
1

Let's do an experiment:

#include <stdio.h>

int main (void)
{
    float fp;
    scanf("%f", &fp);
    printf("%f", fp); 
}

Input: 123e4

Output: 1230000.000000

As you can see, 'e' is considered as part of floating-point number specified by "%f".

For me, the simplest solution is to use scanf("%f%*s ", &f);. After rejected by "%f", "lapsed" is consumed by "%*s", without causing problems. When it comes to 'e', it's just discarded, because the C spec has a footnote "fscanf pushes back at most one input character onto the input stream."

BTW: Do you have to process the floating-point numbers? If not, what about simply treating them as strings? For example, scanf("%[^e]elapsed", str);?

nalzok
  • 14,965
  • 21
  • 72
  • 139
  • In OP's case, the `"e"` of `"elapsed"` is consumed by the `"%f"`, so a following `"%*s"` would not consume `"elapsed"` but `"lapsed"`. – chux - Reinstate Monica Mar 18 '16 at 15:36
  • #include int main (void) { float fp, fp1; char str[100]; scanf("%f%s %f", &fp, str, &fp1); printf("%f|%s|%f", fp, str, fp1); } – nalzok Mar 18 '16 at 15:38
  • Input: `123ert456 789`; Output:`123.000000|ert456|789.000000` – nalzok Mar 18 '16 at 15:38
  • At least on my machine, `ert456` seems to be consumed by `str`. – nalzok Mar 18 '16 at 15:39
  • It works on your platform but that does not mean it works on OP's as your platform is going beyond the C spec. – chux - Reinstate Monica Mar 18 '16 at 15:39
  • @chux Could you show me the document concerning `scanf()` then? PS: I'm using clang on OS X. – nalzok Mar 18 '16 at 15:42
  • sun qingyao: @Nate Eldredge posted the [relevant part of the spec](http://stackoverflow.com/questions/36086616/how-to-scanf-a-float-followed-immediately-by-the-letter-e-in-c/36088386?noredirect=1#comment59820979_36087496). Also see the [footnote](http://stackoverflow.com/questions/36086616/how-to-scanf-a-float-followed-immediately-by-the-letter-e-in-c/36088386?noredirect=1#comment59820353_36087496) The `e`, in OP's machine, is consumed as part of the `float` – chux - Reinstate Monica Mar 18 '16 at 15:44
  • @chux So you think "123e" should be considered as an "input item"? What about the absence of following digits? – nalzok Mar 18 '16 at 15:52
  • @chux I don't know where the **input** format of floating-point numbers is specified, but at least, "123e" isn't a valid floating constant (N1256 6.4.4.2) – nalzok Mar 18 '16 at 16:00
  • 1
    The C specs footnote (footnotes are not specs) "fscanf pushes back at most one input character onto the input stream.", so with `"123el"` `scanf()`, in scanning a `float` sees the `1`, `2`, `3`, `e` and then `'l'`. It pushes back `'l'` and _wants_ to push back `'e'`. Some platforms, like yours, do not follow the footnote and push back the `'e'`. Some platforms, like OP's, follow the footnote, leaving the `'e'` consumed and not available for subsequent parsing. – chux - Reinstate Monica Mar 18 '16 at 16:06
  • To be clear, it isn't that `"123e"` is a valid FP number or not, it is a question of can `scanf()` put the `'e'` back since it has read the character after `'e'`. – chux - Reinstate Monica Mar 18 '16 at 16:08