2

I understand byte is the underlying data type of Java IO but why byte is used to read and write as it has maximum value range -128 to 127. This range is integer, how can integers be used for reading and writing different symbols characters or binary data ?

Expectation is to understand why byte data type is used for Java IO.

Animesh
  • 189
  • 1
  • 3
  • 12
  • 6
    What would you use instead? On the lowest level a file is a sequece of bytes in most modern operating systems. Text for example is handled by encoding the characters as a sequence of one or more bytes per character. – Henry Jan 31 '19 at 15:09
  • 1
    The obvious choice would be an unsigned byte. Many people consider the absence of it in Java a serious design flaw. But in its absence, what else could you use other than signed bytes? – biziclop Jan 31 '19 at 15:17
  • 1
    @Animesh: Are you asking why Java's `byte` data type has the specific range -128 to 127, or the more fundamental question of what a "byte" is and why "binary data" is made up of "bytes"? The latter question is not specific to Java. The phrasing "how can integers be used for reading and writing different symbols characters or binary data" seems to imply the latter question, and it's a very fundamental one. – Daniel Pryden Jan 31 '19 at 15:45
  • I was confused with a program while reading a binary file with java. The output was not as expected. This made me think, is byte the most appropriate datatype or can there be a better one for reading or writing files ? It seems byte was considered standard for reading and writing files since the days ASCII came to picture. Americans made this a standard for using only 8 bits for file read write operations. A different number of bits can also be used to represent a definite smallest unit for reading and writing files. I guess byte has become a de-facto standard as it was never challenged. – Animesh Feb 03 '19 at 14:55
  • 1
    @Animesh: Historically, the relationships between a "byte", a "char" and a "machine word" were all somewhat fluid, and hardware still exists where those have unusual sizes. (See [Is a byte always 8 bits?](https://stackoverflow.com/questions/13615764/is-a-byte-always-8-bits)) POSIX standardized "byte" to mean a grouping of 8 bits exactly. In modern hardware, a byte is the smallest addressable unit of memory, although it may be difficult to access a byte that is not naturally aligned to the machine word size (often 32 or 64 bits nowadays). Java follows POSIX and makes `byte` an 8-bit value. – Daniel Pryden Feb 24 '19 at 21:38
  • 1
    @Animesh: Although the size of a byte was historically tied up with the size of a textual character, Java actually broke with the C tradition and made `byte` and `char` different sizes. Java picked up the Unicode 1.0 standard for encoding text, which was the latest standard at the time, and made `char` a 16-bit unsigned value. (Unicode has since outgrown the BMP and now 16 bit characters are awkward to work with, but it seemed like a great idea 25 years ago.) Depending on your character encoding, a single character could take up as many as 6 bytes, or more if you add diacritical marks. – Daniel Pryden Feb 24 '19 at 21:43

1 Answers1

2

Java was designed after C/C++ with some discussed topics:

  • Java text String (Reader/Writer) contains Unicode so text in mixed scripts may be combined. Internally String was an array of UTF-16 char; a .class file uses UTF-8 string constants. Hence byte[] (InputStream/OutputStream) is only for binary data. Between text and binary data there is always a conversion using the binary data's encoding/charset.

  • Numerical primitive types exist only in the signed version. (Except char that one could consider non-numeric.) The idea was to root out signed/unsigned "problems" of C++. So also byte is signed from -128 to 127. However overflow is irrelevant in java too, so one can do:

    byte b = (byte) 255; // 0xFF or -1

  • The primitive types byte/short/int/long have a fixed size of bytes, where C was notorious cross-platform, and things like C uint32 a bit ugly (32).

Having experienced tricky C bugs with signed/unsigned myself (before >10 years), I think this decision was okay.

It is easier to calculate in a signed mind set, and then at the end consider values as unsigned, than have throughout the expressions signed and unsigned parts.

Nowadays there is support in java for calculations respecting an unsigned interpretation of values, like Integer.parseUnsignedInt.

Joop Eggen
  • 107,315
  • 7
  • 83
  • 138
  • Is there somewhere a document/trace of the different discussed topics and decisions taken at the creation of Java ? – vincrichaud Jan 31 '19 at 15:42
  • Despite my grey beard I was not involved, and has it from hear say. The [wikipedia](https://en.wikipedia.org/wiki/Java_(programming_language)) article only offers some platitudes. But there seemed to be rather academic exchange in the group. Other decisions like only allowing single inheritance, and having multiple interfaces instead, were carefully made. – Joop Eggen Jan 31 '19 at 15:48
  • https://en.wikibooks.org/wiki/Java_Programming/History also not really an essay – Joop Eggen Jan 31 '19 at 16:07