0

I have a function that receives the name of a file as an argument. The idea is to read each word in the given file and save each one in a linked list (as a struct with a value and a pointer to the next struct). I could get it working for small files, but when I give a big .txt file I get a segmentation fault. Using gdb I could figure out that this happens at the while(fscanf(fi, "%s", value) != EOF){ line. For some reason when the file is bigger the fscanf() segfaults. As I could figure out the linked list part, here I pasted just enough code to compile and for you to see my problem.

So my question are: Why fscanf() segfauts with big .txt files (thousands of words), but not with small file (ten words)?

By the way, is there a better way to check for the end of the file?

Thanks in advance.

bool read(const char* file){
    // open file
    FILE* fi = fopen(file, "r"); //file is a variable that contains the name of the file to be opened
    if (fi == NULL)
    {
        return false;
    }

    // malloc for value
    char* value = malloc(sizeof(int));

    // fscanf() until the end of the file
    while(fscanf(fi, "%s", value) != EOF){ // HERE IS MY PROBLEM
        // some code for the linked list
        // where the value will be saved at the linked list
    }

    // free space
    free(value);

    // close the file
    fclose(fi);

    return true;
}
Gabriel
  • 367
  • 2
  • 5
  • 15
  • `char* value = malloc(sizeof(int));` ? – 2501 Dec 09 '14 at 23:11
  • To allocate 4 bytes. Something wrong? It compiles. – Gabriel Dec 09 '14 at 23:12
  • @Gabriel: How big are your strings? You've allocated 4 bytes which will hold 3 characters and a NULL terminator... – Blastfurnace Dec 09 '14 at 23:13
  • The biggest word at the small file is 6 characters. It does not segfaults. OK. But I also tried using char value[50] and same problem. – Gabriel Dec 09 '14 at 23:21
  • 1
    You realize that "doesn't segfault" does not imply "working correctly", right? Your allocation strategy is utterly incorrect. – Ed S. Dec 09 '14 at 23:21
  • OK. I figure out that. Per reading the answers and comments I tried using a 54 characters words and it really segfaults. But now I can't understand why, allocating 4 bytes, a 6 characters worked and a 54 characters word did not. You know why? – Gabriel Dec 09 '14 at 23:29

2 Answers2

1

No, here is your problem:

 char* value = malloc(sizeof(int));   //  <<<<<<< You allocate only place for an int 

 while(fscanf(fi, "%s", value) != EOF){ // <<<<<<< but you read a huge string 

So you end up with a buffer overflow !

You have to make sure that you never overflow the size of your buffer by setting some limits. For example by using the width field of fscanf() to indicate max size of chars to be read for the string:

 char* value = malloc(512);   // Allocate your buffer 
 while(fscanf(fi, "%511s", value) != EOF){ // read max 511 chars + 1 char for terminating 0  
    ...
Christophe
  • 68,716
  • 7
  • 72
  • 138
  • But fscanf() reads a string at a time... By the way I also tried "%50s" as the format string. And why it works at small files? – Gabriel Dec 09 '14 at 23:16
  • 2
    @Gabriel: Why do you think the size of an `int` has anything to do with the size of the character strings you're reading? – Blastfurnace Dec 09 '14 at 23:16
  • @Gabriel the buffer ofverflow can be nasty. The damages of memory corruption is unpredictable. Small overflow can have no noticeable impact (although it's always a time bomb). But the bigger the overflow and the more certain you'll be that your program will really show symptoms of the damages taking place under the surface. – Christophe Dec 09 '14 at 23:29
1

(disclaimer: simplified explanation)

A char* is a pointer to an address of memory. It specifies that it points to an array of characters. A malloc call reserves a block of memory of a certain size.

Your line

char* value = malloc(sizeof(int));

creates a character array that can hold 4 characters (as an int is 4 bytes long generally). And for it to be a complete string the last character has to be a NULL terminator '\0', So really it can only hold 3 readable characters.

You should make that malloc create a block of memory that is larger than the biggest string in the file. Or you could use another safer method such as fgets : http://www.cplusplus.com/reference/cstdio/fgets/

Simba
  • 1,641
  • 1
  • 11
  • 16
  • Well, `fgets` is a completely different animal than `fscanf`, and neither is really *safer* than the other. Might still be the function that OP really wants. – Deduplicator Dec 09 '14 at 23:20
  • Agreed, I don't think OP really needs the full functionality of a `scanf`, he just wants to read in a string. `fgets` makes it a bit harder to get a buffer overflow, esp for a beginner. – Simba Dec 09 '14 at 23:21
  • Ah, I wasn't clear seems like. `fscanf` is for getting a line, `fscanf` with `%99s` for getting a non-whitespace run. – Deduplicator Dec 09 '14 at 23:24