0

I'm having a problem with cin.get():

While getting a char, I am converting it to int, but when I enter it through console, the result is different than when its already set in code.

Here is example:

int ord(unsigned char chr){
    int ret=int(chr);
    return ret;
}
int main(){
    unsigned char chr='ň'; //This is my constant character 'ň' for now
    cout<<ord(chr)<<endl; //outputs : 242 ,which is alright for me, because it is same as in PHP and that I need
    chr=cin.get(); //now I change my constant character 'ň' to 'ň' written through console 
    cout<<ord(chr)<<endl; //outpus : 229 ,which is wrong for me, because its not same as in PHP 
}

How can I fix this?

I want to get 242, not 229, it must be same as ord()'s result in PHP.

3 Answers3

1

The source file and the console input are going through two different processes to interpret the character and turn it into a code. The first is being entered into a text editor and converted by the compiler, the second is interpreted by the OS and console library.

The value 242 corresponds to the character in the ISO/IEC 8859-2 or Windows 1250 code page.

I'm not sure where the value 229 comes from, but almost certainly it's because a different code page is being used to assign a value to the character. Perhaps code page 852 for example.

Mark Ransom
  • 299,747
  • 42
  • 398
  • 622
  • I'd guess that the 229 character is from using OEM code page 852 in the console window. – Michael Burr Jul 26 '12 at 19:37
  • @MichaelBurr, I just figured that out myself. It took a lot of trial and error. It makes sense that this would be used by a console window. – Mark Ransom Jul 26 '12 at 19:40
  • I tried this script to get what charset my application uses: cout< –  Jul 27 '12 at 12:14
  • @DieMeine, the encoding of the input is not determined by the C++ locale, it's determined by the Cmd window. There's a `chcp` command but I don't know how well it works. – Mark Ransom Jul 27 '12 at 13:26
  • Well, it works, but just partialy. When I change chcp to 1250, and I press "ň" on my keyboard, cin.get() outputs "˛", but when I mark and copy console ouput (˛) to here, it pastes ň :D –  Jul 27 '12 at 16:37
1

The problem is that your console is reading characters in from code page 852, where ň is encoded at code point 229 (0xE5), but you want its value in ISO 8859-2 (aka Latin-2), where ň is encoded at code point 242 (0xF2).

I'd strongly suggest that you abandon this approach and work with Unicode exclusively, which doesn't have these types of issues. Dealing with non-Unicode encodings such as the ISO 8859 variants and the DOS code pages is just asking for a world of headaches.

To use Unicode data, see this question. In Unicode, ň is code point U+0148.

Also, this is not relevant to your problem, but your ord() function is useless. cin.get() already returns an int, and unsigned char can be implicitly cast to an int.

Community
  • 1
  • 1
Adam Rosenfield
  • 390,455
  • 97
  • 512
  • 589
  • I tried to use wstring and wchar_t as it was written there,so I replaced cout with wcout and cin with wcin, also added those headers, but result is still same - 229 –  Jul 27 '12 at 08:41
0

The problem is that the character ň is not an ASCII character and therefore has no ASCII code.

Both PHP ord() and C++ ord() promise undefined results when given a character that is not ASCII.

Drew Dormann
  • 59,987
  • 13
  • 123
  • 180