How get list of codepages from string

Question

I have string with different codepages: string multi = "EnglishРусский日本語";

I need to return list of codepages:

int[] GetCodePage(string multi)
{
   return new int[] {1252, 1251, 932};
}

How do you know where one ends and another begins in the string? Seems like some kind of delimiter is in order. — itsme86, May 25 '18 at 16:14
Yes, it is probably possible... The only problem would be that there are characters that are mapped to multiple codepages (nearly all the ASCII characters for example, plus many others). How would you treat them? Other problem... which codepages do you want to handle? Do you have a closed list? — xanatos, May 25 '18 at 17:08
And for DBCS (double byte character set) even seeing what characters are mapped is complex. — xanatos, May 25 '18 at 17:10
I need to find out if in c:\English\Русский\日本語\file.bin there is a folder in there is a language not in set in system local for non-Unicode programs. — user2347380, May 25 '18 at 18:21
@user then you only need to check if the filename uses only characters from the default codepage... the windows api uses unicode and a single non unicode codepage, based on the current settings of windows — xanatos, May 26 '18 at 11:55

xanatos · Accepted Answer · 2018-05-28T14:56:25.283

From your comments, it seems that your problem is different.

If you only need to check if a filename (a string) uses only characters from the "default codepage" (the Windows api uses unicode plus a single non unicode codepage, that is the default codepage for non-unicode programs), then it is quite simple. Encoding.Default is the Windows non-unicode codepage.

public static void Main()
{
    Console.WriteLine(Encoding.Default.BodyName);

    // I live in Italy, we use the Windows-1252 as the default codepage 
    Console.WriteLine(CanBeEncoded(Encoding.Default, "Hello world àèéìòù"));

    Console.WriteLine(CanBeEncoded(Encoding.Default, "Русский"));
}

and the interesting code:

public static bool CanBeEncoded(Encoding enc, string str)
{
    // We want to modify the Encoding, so we have to clone it
    enc = (Encoding)enc.Clone();
    enc.EncoderFallback = new EncoderExceptionFallback();

    try
    {
        enc.GetByteCount(str);
    }
    catch (EncoderFallbackException)
    {
        return false;
    }

    return true;        
}

Note that this code could be optimized. Using an exception to check for the fact that the string can be encoded isn't optimal (but it is easy to write :-) ). A better solution would be to subclass the EncoderFallback class.

How get list of codepages from string

1 Answers1