0

I have a text which cannot be converted to windows-1251 charset. For example:

中华全国工商业联合会-HelloWorld

I have a method for converting from UTF8 to windows-1251:

static string ChangeEncoding(string text)
{
    if (text == null || text == "")
        return "";
    Encoding win1251 = Encoding.GetEncoding("windows-1251");
    Encoding ascii = Encoding.UTF8;
    byte[] utfBytes = ascii.GetBytes(text);
    byte[] isoBytes = Encoding.Convert(ascii, win1251, utfBytes);
    return win1251.GetString(isoBytes);
}

Now it is returning this:

??????????-HelloWorld

I don't want to show chars which was not converted to windows1251 charset correct. In this case I want just:

-HelloWorld

How can I do this?

Dilshod K
  • 2,924
  • 1
  • 13
  • 46
  • `string.Replace("?","");` – Neil May 09 '21 at 16:44
  • 1
    @Neil Not a good idea. The original text might contain `?` chars. – 41686d6564 stands w. Palestine May 09 '21 at 16:45
  • @Neil What if string already contains '?' char like 中华全国工商业联合会-?HelloWorld – Dilshod K May 09 '21 at 16:45
  • 1
    Why are you converting into UTF-8 (which *isn't* the same as ASCII) and then converting from that to Windows-1251? Your text is a string - it's just a sequence of Unicode characters (well, UTF-16 code units). There's no point in converting to UTF-8 first. – Jon Skeet May 09 '21 at 16:47
  • @JonSkeet This tring goes to database which has Windows-1251 charset. I don't wanna save unknow charts in my database. I will prevent: if text contains unknow chars I will warn user – Dilshod K May 09 '21 at 16:49
  • 4
    You can do this by using `.GetEncoder` and supplying a custom `EncoderFallback` implementation in the `.Fallback` that removes characters rather than replacing them (and you can have it set a flag if it's used, so you can detect the removal). If possible, though, changing things on the DB end so it can store Unicode would be preferable (for SQL Server that means using `NVARCHAR` rather than `VARCHAR`, many other database systems have something similar or else they support UTF-8 directly). – Jeroen Mostert May 09 '21 at 16:54
  • @JeroenMostert Can you post it as answer? – Dilshod K May 09 '21 at 16:55
  • Doing so in a worthwhile manner means supplying the code, which is too tedious to bother with. Feel free to cook up your own using my suggestions. – Jeroen Mostert May 09 '21 at 16:57
  • @DilshodK: That doesn't explain why you're encoding it as UTF-8 first though, which is what I was asking about... – Jon Skeet May 09 '21 at 18:27

1 Answers1

0

According to @JeroenMostert suggestion this method helped me:

    public static string ChangeEncoding(string text)
    {
        Encoding win1251 = Encoding.GetEncoding("windows-1251", new EncoderReplacementFallback(string.Empty), new DecoderExceptionFallback());
        return win1251.GetString(Encoding.Convert(Encoding.UTF8, win1251, Encoding.UTF8.GetBytes(text)));
    }
Dilshod K
  • 2,924
  • 1
  • 13
  • 46