0


I am trying to read a text file in Shift-JIS encoding with fwscanf() in C, like this:

_locale_t JapaneseLocale=_create_locale(LC_ALL, "ja_JP");
FILE *inoto,*outoto;
double oto[LineCount][5];
int CurrentReading=0
wchar_t Filename[LineCount][MAX_PATH];
wchar_t Alias[LineCount][MAX_PATH];
setlocale(LC_ALL,"ja_JP");
inoto=_wfopen(L"oto.ini",L"r");
while(fwscanf(inoto,L"%S.wav=%[^L','],%lf,%lf,%lf,%lf,%lf%*C",&Filename[CurrentReading][0],&Alias[CurrentReading][0],&oto[CurrentReading][0],&oto[CurrentReading][1],&oto[CurrentReading][2],&oto[CurrentReading][3],&oto[CurrentReading][4])==7) 
++CurrentReading;
fclose(inoto);

The format of every line in the file is like:
_あ_い_う_え_お.wav=- あB2,568.613,375.0,-583.333,250.0,83.333
The arrays were generated with correct length, and no error occured during compilation or runtime, but the arrays remained empty after the operation.
However, I have been able to use this block of code to decide the number of lines in the file correctly:

wchar_t tempchar;
int LineCount=0;
while(tempchar!=WEOF)
{
tempchar=fgetwc(inoto);
if(tempchar==L'\n') ++LineCount;
}

I am also not sure if %[^L','] can be handled correctly with wide characters. My compiler is Mingw-w64 on Windows 10.
Thanks in advance!

Edit: Thank you Jonathan Leffler and phuclv for the corrections and suggestions! I dug deeper into this issue and found out that I am able to do that without messing with the wide characters using SetConsoleOutputCPin windows.h and the same functions for ASCII input / output without errors. One important thing is that, the correct locale is "Japanese_Japan.932" for Shift-JIS on Windows. But still, knowing that this function is Windows-specific, I want to know how to implement the same functionality on other platforms such as Linux, with only C-standard functions. It seems that printf series functions treat output strings as UTF-8, regardless of the global locale setting.

SuibianP
  • 99
  • 1
  • 11
  • 1
    You should capture the actual value returned by `fwscanf()` and report it. That will tell you something about how many of the conversions are successful, and may point to where the trouble is. If you're getting zero returned, the whole scan is failing; if you're getting some other number, then some conversions work and others don't. With SJIS, could you be running into problems because the 'unshift' characters are not being processed as you expect/need. – Jonathan Leffler Nov 17 '19 at 04:31
  • You might also find it better to use `fwgets()` to read a line and then `swscanf()` to parse the data from the line. – Jonathan Leffler Nov 17 '19 at 05:01
  • 1
    Looking at your format string, I think your use of `L"%S.wav=%[^L','],%lf,%lf,%lf,%lf,%lf%*C"` is dubious — the embedded `L` doesn't do what you expect. You probably meant `L"%[^.].wav=%[^,],…"`. The `%S` reads characters until the next white space; using `%[^.]` reads up to the first dot. The `%[L',']` means "not L, apostrophe or comma", which probably isn't what you had in mind. The first change is likely the crucial one. Do protect your strings from overflow. Use a suitable number like the expanded value of `MAX_PATH - 1` in the format string to limit how much data is read (`%39[^.]`). – Jonathan Leffler Nov 17 '19 at 05:06
  • also avoid too long lines, no one likes horizontal scrolling. Besides, add spaces around operators and after `,` to make it readable – phuclv Nov 17 '19 at 09:06

0 Answers0