Is byte stream encodes byte to characters or only operates on bytes?

Question

We have byte and character stream, If you read some examples from internet you can find that byte stream only operates on bytes and nothing more.

Once i read that both streams encodes bytes to characters depending on encoding, like if it’s byte stream then utf-8, character stream utf-16. So both of them encodes bytes to characters, if this's true why everywhere is written that it operates on bytes only. Byte stream can read data except bytes and then just converts to bytes?

And then why we need encoding in byte stream ?

Some popular websites did not help me.

Java only concerns itself with streams in one of two ways: either with subclasses of `InputStream` or subclasses of `Reader`. The latter concerns itself with decoding a stream of bytes depending on the encoding that was used originally — g00se, Jan 22 '23 at 16:55
You need byte streams when you deal with low-level stuff like some network protocols, when you talk to hardware, or when you deal with encryption and hashing. For mostly everything else you want encodings so that non-ascii characters are displayed correctly. So when you get a bunch of bytes from somewhere, you need to know the encoding to convert the bytes to the right characters. — Robert, Jan 22 '23 at 17:03
@g00se When `stream` is not better specified, it can also be an ObjectInputStream or a Stream type of object - both of which do not necessarily work on bytes. Although here I agree a 'byte stream' will just work on bytes. Why should it en/decode characters? — Queeg, Jan 22 '23 at 19:00

score 1 · Accepted Answer · answered Jan 22 '23 at 17:59

Once i read that both streams encodes bytes to characters depending on encoding, like if it’s byte stream then utf-8, character stream utf-16. So both of them encodes bytes to characters, if this's true why everywhere is written that it operates on bytes only. Byte stream can read data except bytes and then just converts to bytes?

Everything in a typical modern computer has to be represented in bytes: a file holds a sequence of bytes, a network connection lets you send a sequence of bytes, a pointer identifies the location of a byte in memory, and so on. So a byte stream — an InputStream or OutputStream or the like — provides basic processing to let you read or write a sequence of bytes, no matter what kind of data is being represented by those bytes. The data might be text encoded as UTF-8 or UTF-16 or some other encoding, or it might be an image in a GIF or PNG or JPEG or other format, or it might be audio data or video data or a PDF or a Word document or . . . well, you get the idea.

A character stream — a Reader or Writer — provides a higher level of processing specifically for text data, so that you don't need to worry about the specific bytes being used to represent the characters, you just need to worry about the characters themselves. You just need to tell the character stream which character encoding to use (or let it use an appropriate default), and it can handle the rest from there.

But there's one big complication: Java didn't introduce this distinction until version 1.1, and because Java aims for a very high degree of backward-compatibility, there are some classes that survive from version 1.0 that kind of straddle the line. In particular, there is a PrintStream class that extends OutputStream and adds special 'print' methods that take more convenient types, such as String, and handle the character encoding internally. That PrintStream class has been there since version 1.0, and is still in wide use, especially because System.out and System.err are instances of it. (In theory, we should be using PrintWriter instead.)

And then why we need encoding in byte stream ?

We need a character encoding in whatever layer is converting between character sequences and byte sequences. Normally that layer is separate from the byte stream, but as I mentioned above, there are some holdovers from version 1.0 that handle the conversion themselves, which means they need to know which encoding to use.

Good point about `PrintStream`. I'd like to add that it's *almost never* used outside of its use as the type of `System.out` and `System.err`, precisely because it's a weird in-between thing that doesn't fit the current system. — Joachim Sauer, Jan 22 '23 at 18:06
I got the point, so technically we have System.in which is InputStream so it’s byte stream. Then if i write some data to console it will simply transform(?) to bytes? Or just the matter of fact it’s byte stream we use proper class to read them? — Sometimes me, Feb 02 '23 at 01:38
@Sometimesme: I think you're mixing up input and output; writing data to the console doesn't involve System.in or InputStream. So, I'll assume that you meant to write "System.out" and "OutputStream" . . . Re: "we have System.out which is OutputStream so it’s byte stream": Well, sort of. As I mentioned in this answer, PrintStream straddles the line between "byte stream" and "character stream"; it's a byte stream *plus* a bunch of character-oriented methods (named 'print', 'printf', and 'println') that handle conversion from character sequences to byte sequences. Does that make sense? — ruakh, Feb 02 '23 at 22:18
So InputStream is involved when we read some data from console? And if we write data to console is it decoded to smth and stored somewhere? — Sometimes me, Feb 02 '23 at 23:22

score 1 · Answer 2 · answered Jan 22 '23 at 18:05

It is a fundamentally quite straightforward system, but due to some required existing knowledge and possible interactions of several parts it can be confusing.

Let's put down some fundamental truths/axioms:

a InputStream is fundamentally about reading bytes from somewhere.
a OutputStream is fundamentally about writing bytes to somewhere.
Reader/Writer are the equivalent of those two for chars/String/text.
In the Java world, as long as you handle only String (or its related types like StringBuilder, ...) you don't need to care about encoding. It will always look like UTF-16, but you might as well pretend no encoding happens.
if you only ever handle byte[] (and related types like ByteBuffer) then you also don't need to care about encoding.
the encoding only ever comes into play when you want to cross over from the byte[] world to the String world (or the other way around).

So some Writer classes like OutputStreamWriter take a Charset to construct. And that's precisely because it's one of those borders that I mention in the last point above: It's handling both String and byte[] (indirectly), because it is a Writer that writes to a OutputStream and for that to work it will need to convert the String that gets written to it into a byte[] that it can forward to the OutputStream.

Other Writer (such as StringWriter) don't transfer data between those two world: it takes in String and produces String, so no conversion is necessary.

On the other side a ByteArrayInputStream is an InputStream that reads from a byte[], so again: both the input and the output live in "the same world", so no conversion is necessary and thus no Charset parameter exists.

tl;dr the "purity" of InputStream/OutputStream/Reader/Writer exists as long as you look only at those interfaces. When you look at specific implementations some of those will need to convert from the text world to the binary world (or vice versa) and those implementations will need to handle both worlds.

Is byte stream encodes byte to characters or only operates on bytes?

2 Answers2