0

I am new to C, and I am trying to build a C program that scans through a file until EOF, picks out lines that contain a certain keyword and then sets an offset after the last line was searched. When the scan is executed again, it scans the file, this time starting from the saved offset and continues downward until EOF.

I am trying to wrap my head around the different functions of File I/O and I'm having trouble piecing together the procedure to call fopen(), fseek(), fgets(), ftell(), etc to do what I want it to do. Can anyone point me in the right direction or walk me through what I need to get this done?

Thank you!

roxycandigit
  • 1
  • 1
  • 2

3 Answers3

1

I would recomment using getline for reading, and ftell and fseek for getting/setting the offset (and strstr for searching individual lines) in your case.

I'm not sure I understand what your saving of the offset is all about, but it might look like this:

int pick_lines(const char *filename, const char *keyword, long *offset)
{
    FILE *fp;
    char *line = NULL;
    size_t len = 0;

    if (offset == NULL || (fp = fopen(filename, "r")) == NULL)
        return 1;

    if (*offset > 0 && fseek(fp, *offset, SEEK_SET) != 0) {
        fclose(fp);
        return 1;
    }

    while (getline(&line, &len, fp) != -1) {
        if (strstr(line, keyword) != NULL)
            printf("%s", line); // or do something else with chosen line
    }

    if ((*offset = ftell(fp)) < 0) {
        free(line);
        fclose(fp);
        return 1;
    }

    free(line);
    fclose(fp);
    return 0;
}

Here offset is an in/out parameter. It's dereferenced value is used to seek to a given offset (start with *offset == 0) and is then reset to the new offset.

This function would just print every line that contains keyword. If you want to return an array of lines instead, a little extra work is needed.

An example of usage might be:

long offset = 0;
pick_lines(filename, keyword, &offset);
// append lines to file
pick_lines(filename, keyword, &offset);
// ...
MC93
  • 791
  • 6
  • 14
  • `*offset = ftell(fp)` set to `EOF` position. also need free & fclose before `return 1;` – BLUEPIXY Jul 17 '15 at 12:35
  • `if ((*offset = ftell(fp)) < 0) { .. }` this wrong position. because end of while meant End Of File. – BLUEPIXY Jul 17 '15 at 13:48
  • Yes, `*offset` will be set to the EOF position, that's the point. Why should that be wrong? – MC93 Jul 17 '15 at 13:50
  • _picks out lines that contain a certain keyword and then sets an offset after the last line was searched._ But your's always set `EOF` position. your example's 2nd call is meaningless. – BLUEPIXY Jul 17 '15 at 13:51
  • _"a C program that scans through a file until EOF,...then sets an offset after the last line was searched"_. I took that to mean the whole file is always scanned and the EOF positon is stored for a later scan (after the file has been appended to for instance, see my example of usage). But as I say: _"I'm not sure I understand what your saving of the offset is all about"_. This is just an example of which I/O functions to use and how to use them, it can be tweaked to stop scanning wherever you like, the point is you'll use `ftell` to store the offset. – MC93 Jul 17 '15 at 14:03
  • _When the scan is executed again, it scans the file, this time starting from the saved offset and continues downward until EOF._ If the phrase read from the restart position, if purpose is to save the position of the EOF, "until EOF" is funny. – BLUEPIXY Jul 17 '15 at 14:13
  • 1
    "your example's 2nd call is meaningless": No, it isn't, not if you append to the file in between calls as my comment `// append lines to file` is meant to suggest (`FILE *fp = fopen(filename, "a"); fprintf(fp, /*additional lines*/); fclose(fp);`). When I do so then it prints the added lines that contain `keyword` without searching the previous lines. – MC93 Jul 17 '15 at 14:18
  • yes, I understood that your said. But the assumption that files have been added when OP(_the scan is executed again_ only) scan again has not been presented. – BLUEPIXY Jul 17 '15 at 14:19
  • I repeat: I'm not entirely sure what was meant by storing the offset to be able to skip to it later, my interpretation is the only one that makes sense to me. I suggest you provide your own interpretation and/or solution. But that's not the point anyway. Maybe it's meant a little differently, but even so a similar usage of the I/O functions @roxycandigit was enquiring about would provide a solution imho. The code I posted is just an example of what it "might" look like. The point is using `getline` (which @roxycandigit didn't seem to know about), `ftell` and so on should be the way to go. – MC93 Jul 17 '15 at 15:00
  • OP must be to clarify the question. – BLUEPIXY Jul 17 '15 at 15:06
0

You could do it like this (just pseudocode):

fopen();
offset = loadOffset();
fseek(offset); // set offset from previous run
while(!feof())
{
  fgets();
  if(searchKeyword() == true)
  {
    offset = ftell(); // getting the offset (after the line you just read)
    doSomething();

  }
}
saveOffset(offset);
fclose();

Hint: Be carefull with feof(); it returns true only if a input operation failed because of EOF. If the file pointer is at EOF but nothing failed before, it returns false. You have to handle that case.

robin.koch
  • 1,223
  • 1
  • 13
  • 20
0

It sounds like what you want to do is begin the file with a "header" which defines where the last result was found. This way, that information is written and stored in the file itself. An 8-digit hex value could be adequate for representing the offset in a file of size up to 4GB. Something like:

00000022<cr><lf>
Text...<cr><lf>
More text...<cr><lf>
~ <cr><lf>  <-- this '~' is whatever we're looking for
Other stuff...<cr><lf>

I'm making some assumptions here. First, this is on Windows, where text lines are terminated in <cr> and <lf> characters (0x0D and 0x0A respectively.) If Unix, it will be <lf> only. If Mac, it may be <cr> only, or any of the others. I counted them in this example. And this is assuming ANSI-style strings, which means 8-bit encoding (one character = one byte of data.) The same functionality can be achieved with Unicode or other string formats, just note that they may no longer be exactly one byte per character. (In Unicode, it's two bytes per character. So expect trouble if mixing Unicode and ANSI string operations.)

Here, the "header" value is 0x22 or 34 decimal, and if you count all of the characters starting from the beginning of the file, the '~' is reached at the 34th count. So the "header" points to where the last search result was found.

How this works is like this: Initially this header value was zero, so your code would read this and know that it hasn't been searched yet. Lets say the code scanned through the file, incrementing by one for each character, until it found the '~' character. Then it seeks back to the beginning, converts this count value into 8 text characters (itoa or sprintf), and overwrites this part of the file with it. One found, done, or process the whole thing again to search for more. Now the next time this file is processed, your code reads this header value, and converts it from text into an uint (atoi), seeks the file to this offset plus one (since we don't want to catch this one again), then starts scanning again.

The others here have some good examples of actual code to start experimenting with. Note that if you're looking for more than just a character, such as a word or series of digits, the scanning portion becomes slower and more complex. Complex scanning of "tokens" instead of simple characters or words is called lexicographical analysis and that is a whole other topic. Google Flex and Bison or YACC, etc.

rdtsc
  • 1,044
  • 10
  • 17