0

I´m beginning with C and I´m willing to understand certain conditions.

I have a text file, generated by notepad or direct via shell by echo in a windows os.

When running this the output show extra chars. What I ´m doing wrong? How I can read text files in a secure way char by char?

Using codeblocks with minggw.

file.txt:

TEST

C program

void main()
{
   int i;
   FILE *fp;

   fp = fopen("file.txt","r");

   while ((i = fgetc(fp)) != EOF)
   {
      printf("%c",i);
   }
}

Output

 ■T E S T

Guilherme Viebig
  • 6,901
  • 3
  • 28
  • 30

2 Answers2

3

Your code has issues, but the result is fine.

Your file is likely UTF-8 with a (confusingly enough) byte order mark in the beginning. Your program is (correctly) reading and printing the bytes of the BOM, which then appear in the output as strange characters before the proper text.

Of course, UTF-8 should never need a byte order mark (it's 8-bit bytes!), but that doesn't prevent some less clued-in programs from incuding one. Window's Notepad is the first program on the list of such programs.

UPDATE: I didn't consider the spacing between your letters, which of course indicate 16-bit input. That's your problem right there, then. Your C code is not reading wide characters.

unwind
  • 391,730
  • 64
  • 469
  • 606
  • Well, echo also makes the files looks like that. Notepad++ wrote a file that was ok, but when openning the "bad" one it saves it in the same bad way. That´s because the encoding is UCS-2 Little indian.. When I switch to ANSI it runs OK, when UTF8 another chars appear on the beginning. – Guilherme Viebig Oct 23 '13 at 12:57
  • Any thoughts on using fgect to read this encoded files in the less effort possible way? Any implementation or library that takes care of doing that? Thank you! – Guilherme Viebig Oct 23 '13 at 13:01
0

Try this code

void main()
{
   int c,i;
   FILE *fp;

   fp = fopen("file.txt","r");

   while ((i = fgetc(fp)) != EOF)
   {
     printf("%c",i);
   }
}'