How do computers translate everything to binary? When they see a binary code, how do they know if it represents a number or a word or an instruction?

Question

I know how computers translate numbers to binary. But what I don't understand is that I've heard that computers translate everything (words, instructions, ...) to binary, not just numbers. How is this possible?

Could you show me some examples? Like how does a computer translate the letter "A" to binary?

And when computers see a binary code, how can they know if that long string of 0s and 1s represents a number or a word or an instruction?

.

Exemple:

Let's say that a computer programmer encoded the letter "Z" so that it translates to this binary string: 11011001111011010111

So when the computer will encounter this binary string, it will translate it to the letter "Z".

But what happens when we ask this computer "what is the product of 709 by 1259?"

The computer would answer us "892631". But that number, when translated to binary, is 11011001111011010111.

So how would it make a difference between "Z" and "892631"?

.

Please note that I don't know much about computer science, so please explain everything in simple terms.

related: [How does binary translate to hardware?](//stackoverflow.com/q/1518177) and [How does an assembly instruction turn into voltage changes on the CPU?](//stackoverflow.com/q/3706022) — Peter Cordes, May 06 '19 at 22:22

score 10 · Answer 1 · answered Oct 08 '14 at 22:32

Computers doesn't actually translate anything to binary, it's all binary from the start, and the computer never knows anything other than binary.

The character A stored in memory would be 01000001, and the computer doesn't see that as anything but a binary number. When we ask the computer to display that number as a character on the screen, it will look up the graphical representation for it in a font definition to find some other binary numbers to send to the screen hardware.

For example if the computer was an eight bit Atari, it would find eight binary values to represent the character A on the screen:

As you can see, the binary values would then translate to dark and bright pixels when the graphics hardware would draw it on the screen.

Similarly, whatever we do with the numbers in the computer, it's all ways of moving binary values around, doing calculations on binary values, and translating them to other binary values.

If you for example take the character code for A and want to display it as a decimal number, the computer would calculate that the decimal representation of the number is the digits 6 (110) and 5 (101), translate that to the character 6 (00110110) and the character 5 (00110101), and then translate those into their graphical representation.

score 9 · Answer 2 · edited May 23 '17 at 12:19

That is an excellent question, and one which would take years, and several PhDs to fully explain. I can offer you a simplistic answer, but to fully understand you will have to do MUCH more research. Might I suggest some free online classes from MIT on the subject here.

At the lowest level, the letter A and the number 65 are in fact stored using the same sequence of 0's and 1's. 1000001 if I'm not mistaken.

The computer then decides what it is when it grabs it from memory. This means that letters can be displayed as numbers, and vise versa.

The way the computer knows what it's looking for is that the programmer tells it what its looking for. The programmer says I want a number stored at such and such location, and the computer goes and looks for it.

Lets step up a level, because rarely do programmes program at such a low level any more. other programs (usually compilers which take code like C++ and turn it into something the computer can understand) Ensure that the location we are accessing is infact what we said it is. They have extra information that tells them that this particular set of 1's and 0's is actually a floating point type (has a decimal point) whereas this set is and integer(no decimal point)

Then other types build on those types, bigger integers, or floating point, or strings of characters, and again the compilers enforce the types.

This is an oversimplification, and I realize that everything here isn't exactly correct, but it'll get you on the right path. You might check out some of these topics to get a much better idea:

How instructions are differentiated from data?

http://en.wikipedia.org/wiki/Computer_data_storage

How is data, address and Instruction differentiated in Processor/Register/memory?

http://en.wikipedia.org/wiki/Reference_(computer_science)

Hope this clears things up a little. Feel free to ask for clarification!

Building on this answer, you've got Binary which is the 0's and 1's, and is working right on the hardware. A further level of abstraction turns it into Assembly, which contains simple instructions such as ADD, SUB, DIV, MUL, etc and explains *how* the binary should interact. This was still very error prone and eventually you had simple programming languages with grammars and syntax, which are then **compiled** into assembly and binary, translating the human words into machine language. — Kyle Baran, Oct 08 '14 at 22:34

score 5 · Answer 3 · answered Oct 09 '14 at 11:14

So how would it make a difference between "Z" and "892631"?

It doesn't. To the computer, everything is 0s and 1s. They raw bits have no meaning until the processor is TOLD what to do with those 0s and 1s!

For example, I could create a variable x and make its value 0b01000001 (0b means "this is a number I am describing in binary"). I could then ask the processor to print variable x to the screen for me. But I FIRST must tell the processor WHAT x is!

printf("%d", x); // this prints the decimal number 65

printf("%c", x); // this prints the character A

So x by itself means nothing, except the raw bits 01000001. But as the programmer it is my job to tell the computer what x really means.

score 2 · Answer 4 · answered Jun 22 '18 at 17:06

2

Computer uses only 7 bits for storing letters/special-characters whereas it uses all 8 bits of a byte while storing a number.

Let us take "A" AND "65" as examples.

65/2 -- QUO is 32 and reminder is 1 1 2 to the power of 0 is 1

32/2 quo is 16 and reminder is 0 01

16/2 quo is 8 and reminder is 0 001

8/2 quo is 4 and reminder is 0 0001

4/2 quo is 2 and reminder is 0 00001

2/2 quo is 1 and reminder is 0 1000001 2 to the power of 6 is 64

                             =========
                             1000001 binary repressents 65

ASCII value for letter A is stored as 01000001 in binary format (It uses only 7 bits and the 8th bit is stored with 0 for letters and special characters).

I hope this helps.

answered Jun 22 '18 at 17:06

Siva

197
1
3

UTF-8 is a widely used encoding for characters, including "special characters", and letters in non-Latin alphabets. It uses all 8 bits with a variable-length encoding (1 to 4 bytes per character). The number of leading bits set to 1 = total bytes in a multi-byte character. https://en.wikipedia.org/wiki/UTF-8#Description – Peter Cordes Jun 22 '18 at 17:14
Your statement about _only 7 bits for storing letters/special characters_ is just wrong. The outdated 7-Bit US-ASCII code is one of the few for which this claim holds. Your favorite Windows, Linux or MacOS box likely uses one of Windows1252, one of the many ISO-8859 variations or UTF-8, all of which use the full set of 8-bit codes. Btw. There are also 5-bit codes around and even curosities like https://en.wikipedia.org/wiki/DEC_Radix-50. – Adrian W Jun 22 '18 at 17:22

score 2 · Answer 5 · answered Dec 03 '18 at 06:54

Let us discuss some basics here:

Assume your Hard Drive is nothing but an aluminium plate in a circular form and has tiny holes/spots all over (can be seen only using microscope). Spot is a small hole grouped by byte - 8 bits (1 bit is 1 hole).
RAM is similar to Hard Drive but is a semi conductor made of silicon so it can store information in the form of electric field and has address for each byte so it is faster.
Computer stores all the information you enter via keyboard in your Hard Drive as Magnetic pulses (Representing as 1 for human understanding) called 1. If there is no information then the spot (a tiny hole) is empty called zero.

Let us discuss your first part of your question - Could you show me some examples? Like how does a computer translate the letter "A" to binary?

For instance, you enter characters 'A' & 'அ' via keyboard.
The character 'A' is represented as 65 in Unicode/ASCII, which is 01000001 in base 2 binary. The OS does the mapping of A to Binary. This character 'A' you entered is now stored in hard disk as 01000001, and will appear in 8 different spots (like no magnetic pulse for left most number 0, magnetic pulse for 7 in the seventh bit, etc).
In case of RAM, it stores the information in the form of electrical pulses and hence RAM looses all the information when the power is switched off.

Now, everything you see on RAM or HardDrive is energy or no energy in a given byte and we call it Binary format for human understanding (Let us call it 0 for no energy and 1 for energy).

It is up to the compiler now, how it has to be stored. If it is a C compiler on AMD processor/windows OS, it stores the value in 2 bytes (one byte for 5 and one byte for 6). The byte storing the value 5 will be on the right side of 6 if it is an AMD processing - it is called low endian. C program does not support the character 'அ' as it requires more than 1 byte to store international characters.

If it is a Java compiler it uses variable length of 4 bytes called UTF-16. In case of letter 'A' it requires 1 bytes as the Unicode/ASCII representation is 65. Whereas if you are storing an international language character such as 'அ' (Similar to A in Tamil language) then the corresponding Unicode value is 2949 and the corresponding binary value is 11100000 10101110 10000101 (3 bytes). Java has no issues to store and read 'A' and 'அ'.

Now imagine that you have stored the character 'அ' in the hard drive using Java/Windows/AMD Processor as a type character (Char).

Now imagine you want to read this using C program as Char. The C compiler supports only ASCII but not the complete list of Unicode set. Here, the C will read the right most ( 10000101) byte of the above 3 bytes (For char type it reads 1 byte), what do you get on the screen? Your C program is going to read this 1 byte without any issue and will paint this � on your screen if you had asked your program to print. So the compiler is the difference maker.

****Let us discuss your second part of your question now:** *And when computers see a binary code, how can they know if that long string of 0s and 1s represents a number or a word or an instruction?***

Now, your loading your compiled Java program in the RAM in the Text and data area (RAM is split into Text and Data Area at a high level). Now you are asking the ALU of the processor to executes a set of instructions of your program is called a Process.

The line in your compiled program is an instruction to moving the data from one variable to another.

When the ALU executes the first instruction it goes into the corresponding registers sitting outside if the RAM. The processor have set of registers for data and set of registers of instruction. The ALU now knows what register is for what, based on that it performs your instruction.

Hope this helps.

There are some oversimplifications here, but also some mistakes. In the same paragraph you talk about Java using UTF-16, you say that அ is represented as `11100000 10101110 10000101` (3 bytes). That's obviously not true because [UTF-16](https://en.wikipedia.org/wiki/UTF-16) codes unicode codepoints as one or more 2-byte chunks. The bit pattern you show looks like the UTF-8 encoding for that code-point, based on the 3 leading `1` bits in the first byte indicating a 3-byte character. — Peter Cordes, Dec 03 '18 at 08:16
Also, *The OS does the mapping of A to Binary.* is a bit weird. Everything is binary inside a computer. The input to the mapping is a scancode from the keyboard. (Or the USB keyboard driver). The terminal driver, or GUI event deliverer, will map keypresses to their ASCII or UTF-8 or UTF-16 codes, or whatever character set. Or to unicode codepoints and then encode into UTF-8 from there. — Peter Cordes, Dec 03 '18 at 08:18
Thanks, Peter. You are right on your points. I am well aware of how the given key press is converted into 11 bit scan code (Start Bit, Data, Parity Bit and Stop Bit) and sent as bit stream on PS/2 or USB, which is then mapped into the corresponding ASCII or UTF based on the character set we choose in the Control Panel. I did not want to deep dive on this so I over simplified it by stating it as OS. — Siva, Dec 03 '18 at 15:40
Peter, thanks again. I again over simplified the fact that the international character requires 3 bytes in this case as the corresponding decimal value is 2949 and hex is 0xb85. I meant it requires at least 3 bytes, but technically as you said it takes for 4 bytes if it is UTF-16 which uses set of 2 bytes. In this case, it occupies 4 bytes and the left most one will be zeros. Most people assume that Java is UTF-8 but not truce in case of character or string, which is UTF-16 as you said. Thanks. I will make my article more precise going forward. — Siva, Dec 03 '18 at 15:55

How do computers translate everything to binary? When they see a binary code, how do they know if it represents a number or a word or an instruction?

5 Answers5