Write wchar_t* to file - works only for some characters

Question

I have methods which returns unicode texts and i need to write them into file but some characters are not written. I have the following:

const wchar_t* getStandardText() {
    return L"test";
}

const wchar_t* getUnicodeText()
{
    return L"testíček";
}

int main()
{
    FILE *file = fopen(FILE_NAME, "a");

    fputws(getStandardText(), file);
    fputws(getUnicodeText(), file);

    fclose(file);
}

Output in file:

testtestí

Much more confusing for me is that some characters like "í" works and others like "č" not.

I am on Windows with VS 2015 Pro.
For reading of file i use Notepad++ which tells me the file has ANSI encoding.

Your function `getASCII` have quite a misleading name. Also, on what system are you running this program on? And how do you check the contents of the file? — Some programmer dude, Sep 10 '18 at 08:42
What encoding is your source code in and how are you compiling it? — melpomene, Sep 10 '18 at 08:47
A file is a sequence of **bytes**. How those bytes are interpretted as characters can vary. So unless you can say how you are checking the output of your file, it's impossible to answer your question. Questions about international characters are very difficult because most newbies don't know how to ask the right question. They just expect characters to work but it's not as simple as that. — john, Sep 10 '18 at 08:54
You maybe need ccs settings. See Microsoft docs here: https://learn.microsoft.com/en-gb/cpp/c-runtime-library/reference/fopen-wfopen — Hitobat, Sep 10 '18 at 08:57
"ANSI" is not an encoding. (I know Windows likes to claim otherwise, but it's wrong.) — melpomene, Sep 10 '18 at 08:58
_"...which tells me the file has ANSI encoding..."_ Unless the file has a BOM, Notepad has to guess the encoding using a combination of heuristics; this is error prone. Inspect the file in a hex editor to see the actual bytes written. — Richard Critten, Sep 10 '18 at 09:00
"í" is part of ASCII, while "č" is not. That probably explains why it works. http://www.asciitable.com/ — VLL, Sep 10 '18 at 09:01
@RichardCritten Notepad++ is not Notepad. Notepad++ does a smarter check on the file content (but not infallible of course) — john, Sep 10 '18 at 09:03

selbie · Accepted Answer · 2018-09-10T20:13:04.497

3

This works on Windows... Change your mode parameter to have an explicit encoding...

FILE *file = fopen("foobar.txt", "a+, ccs=UTF-16LE");

OR

FILE *file = fopen("foobar.txt", "a+, ccs=UTF-8");

That appears to force the byte-order-marks (FF FE) onto the file header to indicate the file's text is Unicode.

edited Sep 10 '18 at 20:13

answered Sep 10 '18 at 09:02

selbie

100,020
15
103
173

score 1 · Answer 2 · edited Sep 11 '18 at 09:42

1

The file has to be created with appropriate BOM. Following is the most preferred way, and make sure you're dumping only UTF-8 characters to the file. And open through notepad++ to view it.

FILE *file = fopen("test.txt", "a+, ccs=UTF-8");

edited Sep 11 '18 at 09:42

VLL

9,634
1
29
54

answered Sep 10 '18 at 09:40

GSAT

56
2

Write wchar_t* to file - works only for some characters

2 Answers2