The 8086 addressing model contemplated a 16 bits segment and a 16 bits offset combined as segment * 16 + offset
.
The minimum address is 000000h, the maximum one is 10ffefh.
While the latter is technically a 21 bits value, the CPU had only 20 bits of address bus, so the biggest address accessible was 0fffffh1
The addresses above 0fffffh simply wrapped around2, so 10ffefh is an alias for 0ffefh.
Some program began to rely on that.
When the 80286 came out it had 24 bits for the address bus.
An address like 10ffefh didn't wrap around any more.
Emulating the old behavior required too much transistors at the time (10ffefh cannot be masked with an AND) so the A20 mask was introduced.
As the name suggest the address line 20, the 21st bit, was ANDed with a specific bit of a specific register in the 8055/8042 chip.
The BIOS cleared that bit on startup, thereby forcing the 21st bit to zero, emulating the old behavior.
If you don't enable the A20, the 21st bit of every physical address will always be zero.
It is possible to enable the A20 in protected mode with a flat model, which is the closest thing to "32 bit mode", but it requires to be careful with placing the code in memory.
x86 assembly can be used equally for producing 16 or 32 bits code by just telling the assembler the target size.
1 Given by, for example, a segment of 0f000h and an offset of 0ffffh.
2 The 21st bit was simply discarded.
3 Simply put if you are writing 16 or 32 bits code.