so here´s my problem. I'm trying to read a file encoded in Windows-1252 that contains characters that are not valid with that encoding, if we look at this:
https://en.wikipedia.org/wiki/Windows-1252
We can observe that codepoints 129, 141, 143, 144 and 157 are not valid, that is, they don't represent any character. But the characters (the bytes) are still there and I need to read them.
In VB.NET, if I read the file like so:
Dim str As String = File.ReadAllText(filePath,System.Text.Encoding.GetEncoding("Windows-1252"))
Then I get something like:
‘0*qYªI" & ChrW(141) & ChrW(141) & "#´xXVzAÍ" & ChrW(157) & "Ä’¾Ä" & ChrW(141) & "e5b2©wÔ¤x–&¥®-1]¬ŠvVco‡|kC®i
Where you can see that characters that are not valid are represented by their real values (ChrW(141) and ChrW(157)) in the file, even if they are not printable. But if I do this in Java:
String str = FileUtils.readFileToString(new File(pathToFile), "Windows-1252");
The value that I obtain for those characters when reading the file is "63", which is the character "?". According to what I understand from this "https://stackoverflow.com/a/2147968" it seems Java notices the character is not valid for that encoding and just puts a replacement character ("?") instead of it.
My question is, how can I get the real values when reading the text even if they are not valid, is there a way to avoid that Java inserts replacement characters when reading invalid characters? Am I missing something else?