2

I have methods which returns unicode texts and i need to write them into file but some characters are not written. I have the following:

const wchar_t* getStandardText() {
    return L"test";
}

const wchar_t* getUnicodeText()
{
    return L"testíček";
}

int main()
{
    FILE *file = fopen(FILE_NAME, "a");

    fputws(getStandardText(), file);
    fputws(getUnicodeText(), file);

    fclose(file);
}

Output in file:

testtestí

Much more confusing for me is that some characters like "í" works and others like "č" not.

  • I am on Windows with VS 2015 Pro.
  • For reading of file i use Notepad++ which tells me the file has ANSI encoding.
Erik Šťastný
  • 1,487
  • 1
  • 15
  • 41
  • 2
    Your function `getASCII` have quite a misleading name. Also, on what system are you running this program on? And how do you check the contents of the file? – Some programmer dude Sep 10 '18 at 08:42
  • 2
    What encoding is your source code in and how are you compiling it? – melpomene Sep 10 '18 at 08:47
  • A file is a sequence of **bytes**. How those bytes are interpretted as characters can vary. So unless you can say how you are checking the output of your file, it's impossible to answer your question. Questions about international characters are very difficult because most newbies don't know how to ask the right question. They just expect characters to work but it's not as simple as that. – john Sep 10 '18 at 08:54
  • I have added more information into question, thanks. – Erik Šťastný Sep 10 '18 at 08:56
  • You maybe need ccs settings. See Microsoft docs here: https://learn.microsoft.com/en-gb/cpp/c-runtime-library/reference/fopen-wfopen – Hitobat Sep 10 '18 at 08:57
  • "ANSI" is not an encoding. (I know Windows likes to claim otherwise, but it's wrong.) – melpomene Sep 10 '18 at 08:58
  • 2
    _"...which tells me the file has ANSI encoding..."_ Unless the file has a BOM, Notepad has to guess the encoding using a combination of heuristics; this is error prone. Inspect the file in a hex editor to see the actual bytes written. – Richard Critten Sep 10 '18 at 09:00
  • "í" is part of ASCII, while "č" is not. That probably explains why it works. http://www.asciitable.com/ – VLL Sep 10 '18 at 09:01
  • @RichardCritten Notepad++ is not Notepad. Notepad++ does a smarter check on the file content (but not infallible of course) – john Sep 10 '18 at 09:03

2 Answers2

3

This works on Windows... Change your mode parameter to have an explicit encoding...

FILE *file = fopen("foobar.txt", "a+, ccs=UTF-16LE");

OR

FILE *file = fopen("foobar.txt", "a+, ccs=UTF-8");

That appears to force the byte-order-marks (FF FE) onto the file header to indicate the file's text is Unicode.

selbie
  • 100,020
  • 15
  • 103
  • 173
1

The file has to be created with appropriate BOM. Following is the most preferred way, and make sure you're dumping only UTF-8 characters to the file. And open through notepad++ to view it.

FILE *file = fopen("test.txt", "a+, ccs=UTF-8");
VLL
  • 9,634
  • 1
  • 29
  • 54
GSAT
  • 56
  • 2