i have a text file saved by encoding 1256.
as far as my Windows CE 5.0 on my device does not support that code page i can't open the file by that encoding in .NET CF, but the OS supports Unicode. (i showed up some hard code strings in my form)
how can i read that file and convert it's data to Unicode?
how can i convert a single character to its UTF8 equivalent bytes?
THIS LINK says that in 1256 code page table the character number 200/C8 is 0x0628. so what's the relation between them? if i have 200/C8 , how can i obtain the 0x0628?

- 110,170
- 32
- 120
- 176

- 1,849
- 7
- 29
- 46
-
2Is there any chance you could perform the translation somewhere else (app server perhaps), and just give the device data it can natively handle? Otherwise you're just reimplementing Encoding – Marc Gravell Oct 30 '11 at 08:11
-
Regarding this "THIS LINK says that in 1256 code page table the character number 200/C8 is 0x0628. so what's the relation between them? if i have 200/C8 , how can i obtain the 0x0628?": there is no relationship: they are two different systems. You simply have to do what Jon suggests and make a mapping from one to the other by hand. – Adam Cameron Oct 30 '11 at 08:25
-
@Jon Skeet, so does it seem that Windows Code Pages are just simple character mapping?! and .NET framework (or Win32 built-in APIs) just do the same process/mapping Jon did?! ok, thanks for your help. i'm gonna implement it. – losingsleeep Oct 30 '11 at 08:50
1 Answers
It would probably be easiest just to hard code the conversion yourself - create a char[]
of 256 values, populate the first 128 positions with just the equivalent numbers, and then populate the rest manually. The "relation" between them isn't one you can get mathematically - it's just a somewhat-arbitrary assignment of values
For example:
private static readonly char[] CodePage1256 = GenerateCodePage1256();
private static readonly char[] GenerateCodePage1256()
{
char[] ret = new char[256];
for (int i = 0; i < 128; i++)
{
ret[i] = (char) i;
}
string upperCharacters =
"\u20ac\u067e\u201a\u0192\u201e\u2026\u2020\u2021" +
"\u02c6\u2030"; // etc - from the Wikipedia page
for (int i = 0; i < 128; i++)
{
ret[i + 128] = upperCharacters[i];
}
}
Then you have a direct byte
to char
mapping. Of course this is a potentially error-prone process - another possibility would be to create a file with the mapping in, on a system which does have that code page.
Anyway, once you've got the mapping, you can easily convert any array of bytes to a string or char array, at which point you can use the normal .NET classes to write out the file as UTF-8 again. For example:
using (Stream input = File.Open("input.txt"))
{
using (StreamWriter output = File.CreateText("output.txt"))
{
byte[] byteBuffer = new byte[8 * 1024];
char[] charBuffer = new char[byteBuffer.Length];
int bytesRead;
while ((bytesRead = input.Read(byteBuffer, 0, byteBuffer.Length)) > 0)
{
for (int i = 0; i < bytesRead; i++)
{
charBuffer[i] = CodePage1256[byteBuffer[i]];
}
output.Write(charBuffer, 0, bytesRead);
}
}
}

- 1,421,763
- 867
- 9,128
- 9,194
-
so does it seem that Windows Code Pages are just simple character mapping?! and .NET framework (or Win32 built-in APIs) just do the same process/mapping u did?! ok, thanks for your help. i'm gonna implement it. – losingsleeep Oct 30 '11 at 08:49
-
-
@JonSkeet I hadn't ever thought of it... So composable characters aren't re-composed? a + ` isn't "translated" to à if you go from Unicode to Win-1252? – xanatos Oct 30 '11 at 10:53
-
-
@JonSkeet Just tested. No, it doesn't. But the composable ` is changed to a "standard" `. – xanatos Oct 30 '11 at 12:45