how can I get the correct decimal for the extended ascii based on windows-1252? Found few symbol will return as unicode instead of ascii number such as below:
symbol: ’ expected: 146 return: 8217
symbol: ” expected: 148 return: 8221
how can I get the correct decimal for the extended ascii based on windows-1252? Found few symbol will return as unicode instead of ascii number such as below:
symbol: ’ expected: 146 return: 8217
symbol: ” expected: 148 return: 8221
’
(8217) is a Unicode Character 'RIGHT SINGLE QUOTATION MARK' (U+2019).
”
(8221) is a Unicode Character 'RIGHT DOUBLE QUOTATION MARK' (U+201D).
It seems that your code is expecting:
’
(146) Unicode Character 'PRIVATE USE TWO' (U+0092).
”
(148) Unicode Character 'CANCEL CHARACTER' (U+0094).
Which are actually from the Windows-1252 code page.
So, your code is expecting to read Windows-1252, but is receiving Unicode, and the Windows-1252 to Unicode decoding has already taken place when Java read your original file.
Solution: Since your file is already in Windows-1252, and your code wants that, read the file as bytes, not as text.