2

I'm working on a project on C that reads a text file and converts it to an array of booleans. First I read the file to a string of size n (is a unsigned char array), then I use a function to convert that string to a boolean array with size n * 8. The function works perfectly, no questions on that.

I get the string from the file using this code:

unsigned char *Data_in; // define pointer to string
int i;

FILE* sp = fopen("file.txt", "r"); //open file

fseek(sp, 0, SEEK_END);            // points sp to the end of file
int data_dim = ftell(sp);          // Returns the position of the pointer (amount of bytes from beginning to end)
rewind(sp);                        // points sp to the beginning of file

Data_in = (unsigned char *) malloc ( data_dim * sizeof(unsigned char) ); //allocate memory for string
unsigned char carac; //define auxiliary variable 

for(i=0; feof(sp) == 0; i++)       // while end of file is not reached (0)
{
   carac = fgetc(sp);              //read character from file to char
   Data_in[i] = carac;             // put char in its corresponding position
}
//

fclose(sp);                        //close file

The thing is that have a text file made by Notepad in Windows XP. Inside it I have this 4 character string ":\n\nC" (colon, enter key, enter key, capital C).

This is what it looks like with HxD (hex editor): 3A 0D 0A 0D 0A 43.

This table makes it clearer:

character             hex      decimal    binary
 :                    3A       58         0011 1010
 \n (enter+newline)   0D 0A    13 10      0000 1101 0000 1010    
 \n (enter+newline)   0D 0A    13 10      0000 1101 0000 1010
 C                    43       67         0100 0011

Now, I execute the program, which prints that part in binary, so I get:

character      hex      decimal      binary
 :             3A         58         0011 1010
 (newline)     0A         10         0000 1010    
 (newline)     0A         10         0000 1010
 C             43         67         0100 0011

Well, now that this is shown, I ask the questions:

  • Is the reading correct?
  • If so, why does it take the 0Ds out?
  • How does that work?
Jonathan Leffler
  • 730,956
  • 141
  • 904
  • 1,278

4 Answers4

4

Make the fopen binary:

fopen("file.txt", "rb");
                    ^

Otherwise your standard library will just eat away the \r (0x0D).


As a side note, opening the file in binary mode also mitigates another problem where a certain sequence in the middle of the file looks like EOF on DOS.

cnicutar
  • 178,505
  • 25
  • 365
  • 392
  • Thats interesting, now it works perfectly. Also, your side note seems to have answered another question of another problem i think im having, thanks! – Machine-Code Reader May 29 '12 at 07:15
1

It is because you're treating the file as an ASCII file. If you treat it as a binary file, you will be able to see both characters. For this use "rb" as the mode while opening the file. Also use fread to read the file contents.

Superman
  • 3,027
  • 1
  • 15
  • 10
1

In addition to the "rb" issue, there's one more error: you'll read an extra character at the end, because feof(sp) remains 0 after reading the last character. It is set to 1 only after you have attempted to read past EOF. This is a common beginner's mistake. The idiomatic C code to iterate over input characters is

int c;   /* int, not char due to EOF. */

while ((c = fgetc(sp)) != EOF) {
   /* Work with c. */
}
Jens
  • 69,818
  • 15
  • 125
  • 179
0

The other answers have discussed binary vs text mode input.

Your code actually has a separate problem in it. This idiom is for Pascal, not C:

for (i = 0; feof(sp) == 0; i++)
{
   carac = fgetc(sp);
   Data_in[i] = carac;
}

The trouble is that when the fgetc() gets EOF, you treat it as a character (probably mapping it to ÿ, y-umlaut, U+00FF, LATIN SMALL LETTER Y WITH DIAERESIS). The feof() test is misplaced; it does not detect EOF in advance of the attempt to read the next character. Additionally, the function fgetc() and its relatives getc() and getchar() all return an int, not a char. You must learn to use the standard C idiom:

int c;
for (i = 0; (c = fgetc(sp)) != EOF; i++)
   Data_in[i] = c;

The idiom is the combination of assignment and test. The counting around it is less standard; in fact, it is likely to be fairly uncommon. But it is not wrong; it is applicable to your program.

There's no need to use feof() in most C code; virtually any time you use it, it is a mistake. Not always; it exists for a purpose. But that purpose is to distinguish between EOF and an error after a function such as fgetc() has returned EOF, not to test whether you've reached the EOF yet before a reading function says it has reached EOF. (In all my hundreds of programs, I don't think there are more than a very few references to feof(): 2884 source files, 18 references to feof(), and most of those in code originally written by other people.)

Jonathan Leffler
  • 730,956
  • 141
  • 904
  • 1,278
  • I dont know why you say its pascal code when i dont know a single bit of pascal, and im using mingw32 to compile all ansi-c code, but you have a point with the feof() thing, its really easier to use EOF. And isnt the same to use a char in this case? I mean, fgetc returns an int, but it can be interpreted as a char, setting aside what we/the compiler call it, its just 8 bits, right? Or could the fgetc function return a value greater than 255 and smaller than 2^32? Anyway, thanks for the answer, it was very informative! – Machine-Code Reader May 29 '12 at 07:38
  • 1
    fgetc can't return a char, because in addition to the 256 possible char values, it needs to return a 257th: EOF, which is usually #defined as -1. So you need a type at least 9 bits wide. Using int was the choice of the language designers. – Jens May 29 '12 at 08:07
  • @Machine-CodeReader The reason for suggesting Pascal is that it is an error in (standard) Pascal to attempt to read from a file that's reached EOF, so you have to test for EOF before trying the I/O (which is guaranteed not to fail on account of EOF). Jens has nicely summarized the reason for `fgetc()` returning `int`. It is one of the pitfalls that people fall into when learning C (more usually with `getchar()` than `fgetc()`, but the logic is the same). – Jonathan Leffler May 29 '12 at 13:59