Let us discuss some basics here:
- Assume your Hard Drive is nothing but an aluminium plate in a circular form and has tiny holes/spots all over (can be seen only using microscope). Spot is a small hole grouped by byte - 8 bits (1 bit is 1 hole).
- RAM is similar to Hard Drive but is a semi conductor made of silicon so it can store information in the form of electric field and has address for each byte so it is faster.
- Computer stores all the information you enter via keyboard in your Hard Drive as Magnetic pulses (Representing as 1 for human understanding) called 1. If there is no information then the spot (a tiny hole) is empty called zero.
Let us discuss your first part of your question - Could you show me some examples? Like how does a computer translate the letter "A" to binary?
- For instance, you enter characters 'A' & 'அ' via keyboard.
- The character 'A' is represented as 65 in Unicode/ASCII, which is 01000001 in base 2 binary. The OS does the mapping of A to Binary. This character 'A' you entered is now stored in hard disk as 01000001, and will appear in 8 different spots (like no magnetic pulse for left most number 0, magnetic pulse for 7 in the seventh bit, etc).
- In case of RAM, it stores the information in the form of electrical pulses and hence RAM looses all the information when the power is switched off.
Now, everything you see on RAM or HardDrive is energy or no energy in a given byte and we call it Binary format for human understanding (Let us call it 0 for no energy and 1 for energy).
It is up to the compiler now, how it has to be stored. If it is a C compiler on AMD processor/windows OS, it stores the value in 2 bytes (one byte for 5 and one byte for 6). The byte storing the value 5 will be on the right side of 6 if it is an AMD processing - it is called low endian. C program does not support the character 'அ' as it requires more than 1 byte to store international characters.
If it is a Java compiler it uses variable length of 4 bytes called UTF-16. In case of letter 'A' it requires 1 bytes as the Unicode/ASCII representation is 65. Whereas if you are storing an international language character such as 'அ' (Similar to A in Tamil language) then the corresponding Unicode value is 2949 and the corresponding binary value is 11100000 10101110 10000101 (3 bytes). Java has no issues to store and read 'A' and 'அ'.
Now imagine that you have stored the character 'அ' in the hard drive using Java/Windows/AMD Processor as a type character (Char).
Now imagine you want to read this using C program as Char. The C compiler supports only ASCII but not the complete list of Unicode set. Here, the C will read the right most ( 10000101) byte of the above 3 bytes (For char type it reads 1 byte), what do you get on the screen? Your C program is going to read this 1 byte without any issue and will paint this � on your screen if you had asked your program to print. So the compiler is the difference maker.
****Let us discuss your second part of your question now:**
*And when computers see a binary code, how can they know if that long string of 0s and 1s represents a number or a word or an instruction?***
Now, your loading your compiled Java program in the RAM in the Text and data area (RAM is split into Text and Data Area at a high level). Now you are asking the ALU of the processor to executes a set of instructions of your program is called a Process.
The line in your compiled program is an instruction to moving the data from one variable to another.
When the ALU executes the first instruction it goes into the corresponding registers sitting outside if the RAM. The processor have set of registers for data and set of registers of instruction. The ALU now knows what register is for what, based on that it performs your instruction.
Hope this helps.