1

I am trying to parse US7ASCII file using java, using the below code:

FileInputStream fileInputStream = new FileInputStream(file); 
InputStreamReader inputStreamReader = new InputStreamReader(fileInputStream, charSetName);

In line 2, the 2nd parameter is the charset name which I need to pass the charset name.

The charsets supported in Java are provided in the link below: https://docs.oracle.com/javase/8/docs/technotes/guides/intl/encoding.doc.html

There I could not find any entry for US7ASCII. Can someone help me in identifying what charset name I should pass for US7ASCII encoded files.

Zeus
  • 319
  • 2
  • 13
  • 1
    Did you try `US-ASCII`? – g00se Aug 24 '21 at 09:06
  • Actually I think even UTF-8 will work for < 0x7F but US-ASCII is the 'proper' encoding – g00se Aug 24 '21 at 09:15
  • `US7ASCII` is what Oracle calls its ASCII character set. As far as I know that setting *doesn't actually guarantee that all data is ASCII*. In other words: it will accept/store/return non-ASCII characters is some situations, so the *data* that you get might not actually be pure ASCII data, if you're unlucky. If that's the case you'll have a difficult time interpreting it unless you know exactly what encoding was really intended (and it's consistent within your database, which is by no means guaranteed). – Joachim Sauer Aug 24 '21 at 09:30

1 Answers1

1

You should use "US-ASCII", but "ISO-8859-1" or "UTF-8" and probably a few encodings would work as well.

The Java character set / encoding with the name "US-ASCII" is defined to be

"Seven-bit ASCII, a.k.a. ISO646-US, a.k.a. the Basic Latin block of the Unicode character set."

See the javadocs for Charset.

The first 128 codes of "ISO-8859-1" are the same as the 7 bit ASCII codes. While "UTF-8" is a variable length encoding, the first 128 codes are the same as 7 bit ASCII codes. This means that they would work for reading proper 7 bit ASCII files; i.e. those only that only contain the 7 bit codes. (But problems could arise if there are stray 8 bit codes; i.e. bytes in the range 128 to 255.)

Stephen C
  • 698,415
  • 94
  • 811
  • 1,216