endianness doesn't affect writing but reading in memory

Question

I've been reaching a conclusion that in both little endian and big endian.

We write to memory from left to right so that means that the number 0x00FF will be written as the following in both systems:

1000:00

1001:FF

However the reading differs between endianness.

In little endian we will read those two bytes

1000:00

1001:FF

as 0xFF00 and in big endian we will read it as 0x00FF

Now you may say then why if I do something like:

mov word [esp],0x00FF

in little endian processor the result will be 0x00FF, but I said that in little endian the result will be 0xFF00 so it's completely debunks what I've said.

Well it's seems like the assembler just reversed the number to 0xFF00 take a look:

If the assembler wouldn't reverse the number we would read it as 0xFF00.

So basically because the assembler reversed it we write the number as

1000:FF

1001:00

in memory, and we will start to read it from the least significant byte so we will get 0x00FF

Am I right, or does it work differently?

*the number 0x00FF will be written as the following in both systems*. Not true if you write it as a word in single instruction and not a byte at a time. — lurker, Dec 12 '21 at 14:40
How would the computer system work if it wrote bytes in one format and read them in another? The whole point of memory is to be able to round trip values -- something written should come back as the same value when read. Both endian systems writes bytes low to high (address wise), just that the low bytes on little endian are the less significant (both reading and writing) of larger word sizes. — Erik Eidt, Dec 12 '21 at 14:54
yeah but what i'm saying is that if you write 0x00FF the assembler will convert it to 0xFF00 for the little endian processor and when we read it, it will be 0x00FF because we start from the least significant byte in the little endian. i dont see any other way for it to work — AngryJohn, Dec 12 '21 at 15:04
@AngryJohn the assembler does not *convert* 0x00FF to 0xFF00. What happens is that in a little endian architecture if you have a word whose value is 0x00FF that you are writing to memory, it writes the LSB to the lower address, and the MSB to the higher address. Reading it back is consistent with this. To say that 0x00FF is "converted" to 0xFF00 is saying that the bytes are swapped so that FF is MSB, but that's not the case. As I mentioned in my first comment, your statement that is written the same way in both systems (00 written to address 1000 and FF written to address 1001) is incorrect. — lurker, Dec 12 '21 at 17:02
@AngryJohn I can't explain the picture since I don't know where your picture came from, and I don't know what instructions were executed on each platform that resulted in that picture. You haven't provided enough information. — lurker, Dec 12 '21 at 17:44
@lurker that's the result of `mov word [esp],0x00FF` in hex editor — AngryJohn, Dec 12 '21 at 17:48
Two octets as `FF` THEN `00` is simply not the same as the 16-bit word `FF00`. If you find `FF` then `00` in memory this does not tell you what endianness was used to write it that way, and so you cannot tell whether it is either `FF00` or `00FF` without applying the endianness. — ecm, Dec 12 '21 at 18:37
Near duplicate of [Are machine code instructions fetched in little endian 4-byte words on an Intel x86-64 architecture?](https://stackoverflow.com/a/68229991) - x86 machine code stores immediates in little-endian, so store-immediate instructions copy their immediate unchanged from their machine-code to the destination. — Peter Cordes, Dec 12 '21 at 22:24

Erik Eidt · Accepted Answer · 2022-01-03T21:42:25.993

Endianness is a relationship between numeric values that span multiple storage units, usually bytes, and is expressible as a pair of formulas for decomposing and recomposing — for converting a single value (that needs multiple bytes) into a sequence of bytes, and back from sequence of bytes to a single value.

(Endianness doesn't tell us how the processor performs these operations, just that they work according to the formulas below. So, specifically, we don't know what ordering in time are used for fulfilling the formulas; the formulas are independent of time, but rather only sensitive to byte ordering in the sequence.)

For example, in 16 bits we have a value 0x1234, and are going to store it in memory as a sequence of bytes, namely, a lower byte, stored at a lower address and a higher, stored at higher address, where the higher address = lower address + 1.

The following formulas decomposes the value using little endian:

lower byte  = 0x1234 & 0x00FF            = 0x34
higher byte = 0x1234 / 256 = 0x1234 >> 8 = 0x12

The little endian recomposition formula is

value = lower byte + higher byte * 256 = 0x34 + 0x12 * 256 = 0x1234

For big endian, the formulas (as compared with little endian) swaps which byte is multiplied/divided:

lower byte  = 0x1234 / 256    = 0x1234 >> 8 = 0x12
higher byte = 0x1234 & 0x00FF =               0x34

And recomposition:

value = lower byte * 256 + higher byte = 0x12 * 256 + 0x34 = 0x1234

These formulas are built into the processors and well known in advance, so, when the assembler is assembling data as in:

.data
dw 0x1234

It knows that (1) this is 16-bit data and (2) the target hardware is little endian. So, it will put 0x34, 0x12 as bytes in memory, following the formulas for little endian decomposition. (Again, it is not time ordering but relative sequencing.)

For instructions, we can say that the assembler encodes the instructions and any immediates needed according to the machine code instruction set architecture. When an immediate is materialized, it comes back as part of instruction decoding. Due to the way intel processors work, the encoding within the machine code instruction will also appear little endian, however, the encoding may be shorter than the full size of the immediate written in assembly language. No matter, the processor will reconstitute the proper constant internally and then use it. If an immediate is stored to memory it will use the little endian decomposition formulas to create the sequence of two bytes to store, just as it would when storing a register's value to memory.

Because the formulas pair (decomp/recomp), reading a 16-bit location that was last written as a 16-bit original value, the original value comes back. Only when we view that location as individual bytes will we need concern with endianness.

Unfortunately, the debugger dumps memory as individual bytes, exposing us to endianness when the data is multi-byte data. There is no way to tell from a memory dump alone what kind of values are stored there, whether 16-bit or 8-bit values. That information, however, is in the program and its machine code instructions, in the way they treats those locations (as to whether it uses 16-bit memory accesses or 8-bit ones).

When the program consistently uses the same memory the same way, it will get expected values. But there are lots of opportunities for logic errors in programs in assembly. Such errors include using the wrong size, using the wrong sign, failure to initialize. Higher level languages have types that prevent the first two and good languages also have features to detect uninitalized variables. But in machine code every single instruction repeats the relevant treatment of physical storage to accomplish consistency.

(To be clear, it is not always an error to view a 16-bit value as a sequence of bytes, sometimes that is necessary, i.e. when storing a number in a file, or inside an assembler/compiler).

endianness doesn't affect writing but reading in memory

1 Answers1

Related