How can I determine the index in codepage 850 for a char in C#?

Question

I have a text file which is encoded with codepage 850. I am reading this file the following way:

using (var reader = new StreamReader(filePath, Encoding.GetEncoding(850)))
{
    string line;
    while ((line = reader.ReadLine()) != null)
    {
        //...
    }
    //...
}

Now I need for every character in the string line in the loop above the zero-based index of that character which it has in codepage 850, something like:

for (int i = 0; i < line.Length; i++)
{
    int indexInCodepage850 = GetIndexInCodepage850(line[i]); // ?
    //...
}

Is this possible and how could int GetIndexInCodepage850(char c) look like?

score 4 · Accepted Answer · edited Aug 31 '11 at 11:50

4

Use Encoding.GetBytes() on the line. CP850 is an 8-bit encoding, so the byte array should have just as many elements as the string had characters, and each element is the value of the character.

edited Aug 31 '11 at 11:50

xanatos

109,618
12
197
280

answered Aug 31 '11 at 11:40

parsifal

1,507
8
7

Guffa · Answer 2 · 2011-08-31T13:28:09.483

3

Just read the file as bytes, and you have the codepage 850 character codes:

byte[] data = File.ReadAllBytes(filePath);

You don't get it separated into lines, though. The character codes for CR and LF that you need to look for in the data are 13 and 10.

edited Aug 31 '11 at 13:28

answered Aug 31 '11 at 11:40

Guffa

687,336
108
737
1,005

Good to know this way but I also need the Unicode string, so I want to keep reading the file with the normal StreamReader. parsifals solution is perfect for us. – Hulda Aug 31 '11 at 14:57
@Hulda: You can use the `Encoding.GetString` method to decode the bytes to get the string, instead of first decoding the data, and then encoding it again. – Guffa Aug 31 '11 at 15:12

score 1 · Answer 3 · answered Aug 31 '11 at 11:53

1

You don't need to.

You are already specifying the encoding in the streamreader constructor. The string returned from reader.ReadLine() will already have been encoding using CP850

answered Aug 31 '11 at 11:53

James Kyburz

13,775
1
32
33

I don't understand. Does a `string` in C# **have** an encoding? I thought a string is always stored as Unicode string, isn't it? I cannot find any method or property on the `string` type which says something about encoding. – Hulda Aug 31 '11 at 12:06
@Hulda: That is correct, a string is always Unicode. If you want the code page 850 character code for a `char` you would need to encode it again. – Guffa Aug 31 '11 at 13:00
internally .net uses utf-16 yes, but Encoding.GetBytes() will only achieve the same as setting the encoding on the streamreader object. – James Kyburz Aug 31 '11 at 13:12

How can I determine the index in codepage 850 for a char in C#?

3 Answers3