1

When you get down to the bare metal, all data is stored in bits, which are binary (1 or 0). However, I sometimes see terms like "binary file" which implies the existence of files that aren't binary. Also, for things like base64 encoding, which Wikipedia describes as a "binary-to-text encoding scheme". But if I'm not mistaken, text is also stored in a binary format on the hardware, so isn't base64 encoding ultimately converting binary to binary? Is there some other definition of "binary" I am unaware of?

1 Answers1

1

You are right that deep down, everything is a binary file. However at its base, a binary file is intended to be read as an array of bytes, where each byte has a value between 0 and 255. A text file is intended to be read as an array of characters.

When, in Python, I open a file with open("myfile", "r"), I am telling it that I expect the underlying file to contain characters, and that Python just do the necessary processing to give me characters. It may convert multiple bytes into a single characters. It may canonicalize all possible newline combinations into just a single newline character. Some characters have multiple byte representations, but all will give me the same character.

When I open a file with open("myfile", "rb"), I literally want the file read byte by byte, with no interpretation of what it is seeing.

Frank Yellin
  • 9,127
  • 1
  • 12
  • 22
  • So what really is the difference between "binary" data and text data? I understand, ultimately, that the bits that make up text data represent characters according to some encoding scheme (ASCII, UTF-8, etc.) but then what does "binary" data represent? – Zakareya Alatoli Jan 09 '22 at 21:55
  • Binary data has no intrinsic meaning. It means whatever the program that wrote it and the program that read it expect it to mean. You can download documentation from Microsoft on what each byte of a .exe file means. You can read the Java Virtual Machine Specification to learn what each byte of a java .class file means. Documentation from Adobe will explain how to read a pdf file. – Frank Yellin Jan 10 '22 at 03:58
  • I see. The "binary vs text" terminology is misleading because technically all data is binary and that data can represent anything. It could be text, numbers, machine instructions, a long list of ex-lovers, etc. – Zakareya Alatoli Jan 10 '22 at 16:30