difference between text file and binary file

Question

Why should we distinguish between text file and binary files when transmitting them? Why there are some channels designed only for textual data? At the bottom level, they are all bits.

Dietrich Epp · Accepted Answer · 2011-05-18T02:03:40.940

33

At the bottom level, they are all bits... true. However, some transmission channels have seven bits per byte, and other transmission channels have eight bits per byte. If you transmit ASCII text over a seven-bit channel, then all is fine. Binary data gets mangled.

Additionally, different systems use different conventions for line endings: LF and CRLF are common, but some systems use CR or NEL. A text transmission mode will convert line endings automatically, which will damage binary files.

However, this is all mostly of historical interest these days. Most transmission channels are eight bit (such as HTTP) and most users are fine with whatever line ending they get.

Some examples of 7-bit channels: SMTP (nominally, without extensions), SMS, Telnet, some serial connections. The internet wasn't always built on TCP/IP, and it shows.

Additionally, the HTTP spec states that,

When in canonical form, media subtypes of the "text" type use CRLF as the text line break. HTTP relaxes this requirement and allows the transport of text media with plain CR or LF alone representing a line break when it is done consistently for an entire entity-body.

edited May 18 '11 at 02:03

answered May 18 '11 at 01:51

Dietrich Epp

205,541
37
345
415

Minor point: Technically, SMS has 7-bit char (packed), 8-bit binary and 16-bit char modes. More interesting are character set conversions on text streams. I hope EBCDIC has been replaced by Unicode mostly everywhere now, but in Olden Times (before Y2K and GWoT, eh!), one would be grateful for automatic conversion between EBCDIC and "ASCII + some weird codepage in the 0x80-0xFF range the provenance of which you can't remember". Especially in FTP "text mode". Often, it didn't work... – David Tonhofer Dec 10 '16 at 13:27
In the end "text" is a context-dependent interpretation, whereas "binary" is not. When editors (or any processes) on two systems try to read "text" from a binary file, either their conventions on what "text" is must agree, or else a conversion must be performed when the binary file is transferred between system or when the binary file is read or written. Compare with two (imaginary) systems where one conventionally works with PNG files and the other with GIFs. HTTP transmits text but adds metadata in the form of an ASCII header and the `Content-Type` line which gives precise content info. – David Tonhofer Dec 10 '16 at 14:10
@RestlessC0bra: That's incorrect, NEL is not the same as LF. You're right that NEL is not part of ASCII. – Dietrich Epp Jul 02 '17 at 23:07
Like most characters, it's part of the Unicode standard. That's not the only character set it's found in, it's just not part of ASCII. – Dietrich Epp Jul 02 '17 at 23:13
So base64 come and save us in the 7-bit transmission channel case? – BlueMice Sep 22 '22 at 06:31

score 12 · Answer 2 · edited Dec 29 '19 at 22:16

12

All files are saved in one of two file formats - binary or text. The two file types may look the same on the surface, but their internal structures are different.

While both binary and text files contain data stored as a series of (bits (binary values of 1s and 0s), the bits in text files represent characters, while the bits in binary files represent custom data.

edited Dec 29 '19 at 22:16

Ankur Agarwal

23,692
41
137
208

answered Dec 04 '11 at 08:24

munendra

121
1
2

1

Possible source (well worth the read): http://fileinfo.com/help/binary_vs_text_files – Waldir Leoncio Jul 20 '15 at 21:56

Mishax · Answer 3 · 2015-05-28T10:18:02.897

Important to add to the answers already provided is that text files and binary files both represent bytes but text files differ from binary files in that the bytes are understood to represent characters. The mapping of bytes to characters is done consistently over the file using a certain code page or Unicode. When using 7 or 8-bit code pages you can spin the dial when reading these files and interpret them with an English alphabet, a German alphabet, Russian alphabet, or others. This spinning the dial doesn't affect the bytes, it does affect which characters are chosen to correspond to the bytes.

As others have stated, there is also the issue of the encoding of line break separators which is unique to text files and which may differ from platform to platform. The "line break" is not a letter in our alphabet or a symbol you can write, so other rules apply to it.

With binary files there is no implicit convention on character encoding or on the definition of a "line".

score 6 · Answer 4 · answered May 18 '11 at 01:57

Distinguishing between the two is important as different OSs treat text files differently. For example in *nix you end your lines with just \n while in MS OSs you use \r\n and in Macs you use \n\r. Software such as FTP clients try to change the line endings on text files to match the destination OS by adding/removing the characters. This is to make sure that the text file will look properly on the destination OS.

for example, if you create a text file in *nix with line breaks and try to copy it to a windows box as a binary file and open it in notepad, you will not see any of the line endings, but just a clog of text.

Macs use LF these days, they used to use CR. I've not heard of any system using LFCR. — Dietrich Epp, May 18 '11 at 01:59

score -2 · Answer 5 · edited Mar 01 '16 at 15:30

-2

All machine language files are actually binary files.

For opening a binary file, file mode has to be mentioned as "rb"or "wb"in fopen command. Otherwise all files are opened in default mode, which is text mode.

It may be noted that text files can also be stored and processed as binary files but not viceversa.

The binary files differ from text file in 2 ways:

The storage of newline characters
The EOF character

Eg:

wt-t stands for textfile
Wb-b stands for binaryfile

Binary files do not store any special character at the end either file end is verified by ueing their size itself.

edited Mar 01 '16 at 15:30

SnyersK

1,296
8
23

answered Mar 01 '16 at 14:43

user6003105

1

1

This is wrong. Text files do not generally have a "EOF character" (just open them in binary mode. The EOF may be generated by the libraries in some environments) – David Tonhofer Dec 10 '16 at 14:15

difference between text file and binary file

5 Answers5

Linked