2

I have a text file which is encoded with codepage 850. I am reading this file the following way:

using (var reader = new StreamReader(filePath, Encoding.GetEncoding(850)))
{
    string line;
    while ((line = reader.ReadLine()) != null)
    {
        //...
    }
    //...
}

Now I need for every character in the string line in the loop above the zero-based index of that character which it has in codepage 850, something like:

for (int i = 0; i < line.Length; i++)
{
    int indexInCodepage850 = GetIndexInCodepage850(line[i]); // ?
    //...
}

Is this possible and how could int GetIndexInCodepage850(char c) look like?

Hulda
  • 275
  • 1
  • 4
  • 7

3 Answers3

4

Use Encoding.GetBytes() on the line. CP850 is an 8-bit encoding, so the byte array should have just as many elements as the string had characters, and each element is the value of the character.

xanatos
  • 109,618
  • 12
  • 197
  • 280
parsifal
  • 1,507
  • 8
  • 7
3

Just read the file as bytes, and you have the codepage 850 character codes:

byte[] data = File.ReadAllBytes(filePath);

You don't get it separated into lines, though. The character codes for CR and LF that you need to look for in the data are 13 and 10.

Guffa
  • 687,336
  • 108
  • 737
  • 1,005
  • Good to know this way but I also need the Unicode string, so I want to keep reading the file with the normal StreamReader. parsifals solution is perfect for us. – Hulda Aug 31 '11 at 14:57
  • @Hulda: You can use the `Encoding.GetString` method to decode the bytes to get the string, instead of first decoding the data, and then encoding it again. – Guffa Aug 31 '11 at 15:12
1

You don't need to.

You are already specifying the encoding in the streamreader constructor. The string returned from reader.ReadLine() will already have been encoding using CP850

James Kyburz
  • 13,775
  • 1
  • 32
  • 33
  • I don't understand. Does a `string` in C# **have** an encoding? I thought a string is always stored as Unicode string, isn't it? I cannot find any method or property on the `string` type which says something about encoding. – Hulda Aug 31 '11 at 12:06
  • @Hulda: That is correct, a string is always Unicode. If you want the code page 850 character code for a `char` you would need to encode it again. – Guffa Aug 31 '11 at 13:00
  • internally .net uses utf-16 yes, but Encoding.GetBytes() will only achieve the same as setting the encoding on the streamreader object. – James Kyburz Aug 31 '11 at 13:12