I've got a text file, foo.txt
, with these contents:
R⁸2
I had a large program reading it and doing things with each character, but it always received EOF when it hit the ⁸
. Here's the relevant portions of the code:
setlocale(LC_ALL,"");
FILE *in = fopen(argv[1],"r");
while (1) {
wint_t c = getwc(in);
printf("%d ",wctob(c));
if (c == -1)
printf("Error %d: %s\n",errno,strerror(errno));
if (c == WEOF)
return 0;
}
It prints 82 -1
(the ASCII codes for R
and EOF). No matter where I have the ¹
in the file, it always reads as EOF. Edit, I added a check for errno
and it gives this:
Error 84: Invalid or incomplete multibyte or wide character
However, ⁸ is Unicode U+2078 'SUPERSCRIPT EIGHT'. I wrote it to foo.txt
via cat
and copy-pasting from fileformat.info. A hexdump of foo.txt
shows:
0000000: 52e2 81b8 32 R...2
What's the problem?