0

I have a few lines that contain special characters like these lines:

CommunautéFinancièreAfricaineBEACFranc(XAF) CommunautéFinancièreAfricaineBCEAOFranc(XOF)

But, when I write those lines into a text file, I get this as a result:

CFACommunaut�Financi�reAfricaineBEACFranc CFACommunaut�Financi�reAfricaineBCEAOFranc

This is how I write the lines:

File.WriteAllLines(@"c:\file5.txt", lines);

I also tried the 3rd parameter of File.WriteAllLines() by passing an Encoding. But that didn't help either.

File.WriteAllLines(@"c:\file5.txt", lines, Encoding.UTF8); File.WriteAllLines(@"c:\file5.txt", lines, Encoding.ASCII);

This is how I read all lines:

File.ReadAllLines(@"C:\file4.txt").ToList()
        .ForEach(g => 
            lines.Add(g.ToString()
            .Replace("/", string.Empty)
            .Replace("(", string.Empty)
            .Replace(")", string.Empty))
        );

The crazy thing is, the characters are displayed perfectly fine in another text file (file4.txt) where I read everything in.

Quoter
  • 4,236
  • 13
  • 47
  • 69
  • 1
    You're on the right rack with the encoding, I think. Not sure which one you need, but it should be one that handles those special characters. – Tim May 25 '14 at 12:30
  • I'm tempted to close this as a duplicate (e.g. /questions/1025332/determine-a-strings-encoding-in-c-sharp or http://stackoverflow.com/questions/5864272/understanding-text-encoding-in-net). All you have to do is understand what an encoding is, and find the correct one. – usr May 25 '14 at 12:33
  • How do you obtain the result? Is your text file editor Unicode-aware? – CodeCaster May 25 '14 at 12:39
  • @CodeCaster, it's just notepad. I don't know if it's unicode aware. – Quoter May 25 '14 at 12:46
  • @Tim, i've tried them all, but I find it very strange that from one text file all is fine, to the other text file where I get this result. – Quoter May 25 '14 at 12:46

1 Answers1

1

Notepad does not recognize UTF-8 unless you add a BOM.

You have three choices:

  1. Decide notepad is dumb and ignore the issue. Better editors will recognize UTF-8.
  2. Add a BOM. Use var utf8WithBom = new Utf8Encoding(true, true) as second argument.
  3. Use the legacy encoding. Specify Encoding.Default as second argument.

    The resulting file will corrupt characters not present in the current code page. It won't display correctly on systems using in certain other countries.

    IMO this one is a bad idea, but I still mention it for completeness.

CodesInChaos
  • 106,488
  • 23
  • 218
  • 262
  • It had to do something with BOM. I eventually converted the text from ANSI (IIRC) to utf8 with BOM. I did this conversion with notepad++ via Encoding menu. – Quoter May 25 '14 at 20:14