2

I was testing the interaction of fseek and fgetpos (more precisely if I can get an fpos_t that's inside a multi byte) and got into a pretty unexpected situation.

Whenever I use setlocale(LC_CTYPE, "C.UTF-8"); and fputwc, fseek seems to not work anymore and the only way to move the cursor inside the file is to use fgetwc.

The code is below (all calls complete successfully, i.e. setlocale, fseek, fputwc, etc.., for brevity I stripped the checking of the return value).

This happens on Ubuntu with glibc 2.16. Does anyone have a good explanation why this happens? Is this a bug in glibc?

setlocale(LC_CTYPE, "C.UTF-8");
uselocale(LC_GLOBAL_LOCALE);

FILE* fp = fopen("/tmp/wc.test", "w+");

wchar_t wc = 0x00a2;
fputwc(wc, fp);
fflush(fp);

rewind(fp);

long ftell_out;
fpos_t fpos_out;

fseek(fp, 1, SEEK_SET);   // looks like it doesn't have any effect
ftell_out = ftell(fp);    // ftell_out is 0
fgetpos(fp, &fpos_out);   // the (inner) offset of fpos_out is 0 as well

fgetwc(fp);               // it reads wc(0x00a2) here as if we are at
ftell_out = ftell(fp);    // this is 2
fgetpos(fp, &fpos_out);   // this is 2

Some notes:

  • if I close the file and reopen it in read more then everything works as expected (after fseek, ftell_out/fpos_out are 1 and fgetwc fails with a proper errno since the position is inside a multibyte)

  • if I don't use setlocale the output is almost as expected except that fgetwc doesn't set the errno anymore.

Calin
  • 1,471
  • 1
  • 15
  • 22
  • 1
    Have you checked what `fseek` returns? And yes it can fail and return `-1` (in which case you should check `errno`). – Some programmer dude May 12 '14 at 15:58
  • 1
    What's the return value of `setlocale` (returns `NULL` on error) and `uselocale` (returns `(locale_t)0` on error)? In general, you're not checking any return values. If a function returns an error indication, then call `perror` for more info. – ooga May 12 '14 at 15:59
  • fseek returns 0 and setlocale returns "C.UTF-8". In general all calls complete successfully. I didn't add checks to keep the snippet clear. – Calin May 12 '14 at 16:12
  • 1
    From POSIX: _If the stream is to be used with wide-character input/output functions, the application shall ensure that offset is either 0 or a value returned by an earlier call to ftell() on the same stream and whence is SEEK_SET._ – ninjalj May 12 '14 at 18:09
  • @ninjalj Though it doesn't explain what happens if that's not the case. And in general how can they "check" that the offset was return by a previous ftell? For what is worth, I already tried setting the offset to match the beginning of the next multibyte and I got the same unexpected results. The implementation is pretty hard to follow... – Calin May 12 '14 at 23:07
  • @Calin: If that's not the case, you have Undefined Behavior. In general, when working with text streams, you should treat the result of `ftell()` as an opaque cookie. POSIX relaxes that requirement for non-wide text streams, given that POSIX files are plain byte streams where newlines are represented as single characters, so there is a 1:1 mapping between _file position indicator_ and character number. – ninjalj May 12 '14 at 23:38
  • @ninjalj Thanks for replies. Totally agree with everything you said. I 'm trying to understand what happens behind the scenes (even if it's just implementation details) and what sort of implementation could actually lead to this behaviour. I'm already following the code but it's fairly time consuming and I thought trying my luck here. – Calin May 12 '14 at 23:52
  • @ninjalj It doesn't help if one changes `fseek` to use the result of a previous `ftell`. [A minimal demonstration](http://coliru.stacked-crooked.com/a/a37106600980b980). The result is different if you change the locale to C. – n. m. could be an AI May 13 '14 at 10:13
  • @ninjalj The file will contain multibyte characters according to the locale, the library does any necessary transcoding. – n. m. could be an AI May 13 '14 at 14:49
  • @n.m.: OK, deleted my confus{ed,ing} comment – ninjalj May 13 '14 at 15:33
  • 1
    I'm wondering if it's related to https://sourceware.org/bugzilla/show_bug.cgi?id=16532 , https://sourceware.org/bugzilla/show_bug.cgi?id=16605 , or https://sourceware.org/bugzilla/show_bug.cgi?id=14543 – ninjalj May 14 '14 at 12:55
  • It seems so. This one https://sourceware.org/bugzilla/show_bug.cgi?id=14543 is more likely to be the culprit (it was fixed in 2.17 and I used 2.16 for my tests). If I link against 2.19/master then the test executes successfully. Thanks! if you add this as an answer I'll mark it as accepted. – Calin May 14 '14 at 15:44

0 Answers0