0

I am trying to figure out how data is stored, so I have a little program:

section .data
    msg1 dq 'a',
    msg2 dq 'b'
    msg3 dq 'ab'

GDB shows:

(gdb) p/t (int)msg1
$1 = 1100001
(gdb) p/t (int)msg2
$2 = 1100010

But:

(gdb) p/t (int)msg3
$3 = 110001001100001

I expect that msg3 should be just msg1 + msg2 = 110000101100010, but it is 110001001100001, why?

asd
  • 266
  • 7
  • 15
  • 3
    _"I expect that msg3 should be just `msg1 + msg2 = 1100001110001001100001`"_ How do you get from `1100001 + 1100010` to `1100001110001001100001`? `dq 'ab'` is essentially the same as `db 'a','b',0,0,0,0,0,0` – Michael Jun 07 '21 at 10:13
  • 1
    Remember that x86 is little-endian, and that printing order is first letter at lowest address. – Peter Cordes Jun 07 '21 at 10:21
  • 2
    You seem to have missed a zero in your expected value. Surely you mean `110000101100010`? Also, a simple addition would not result in concatenation of bytes. For that you would need something like `msg1 + (msg2 << 8)`. – Michael Jun 07 '21 at 10:26
  • @Michael Yes, I missed a zero, but I still don't understand why it should be `msg1 + (msg2 << 8)` – asd Jun 07 '21 at 10:56
  • 1
    it's `'a' + ('b'<<8`) because you put `a` first, i.e. at a lower address, and x86 is little endian. If that doesn't mean anything to you, google it: it's essential for understanding the concept of how a qword value is represented by bytes that are individually addressable. – Peter Cordes Jun 07 '21 at 11:06
  • 2
    Also note that in NASM / YASM syntax, `msg1` is the *address* of the label, so `msg1 + msg2` would be the sum of the addresses, which is nonsense. Or if you mean the values, then `+` is normally integer addition. You can't add strings, that's not a real thing, and in assembly language (including YASM macros), GDB, or C, `+` isn't used as a string-concatenation operator. (Although maybe in GDB's Python scripting language?) Clearly `dq 'ab'` isn't `dq 'a' + 'b'` - that's some other integer value. – Peter Cordes Jun 07 '21 at 11:11
  • 2
    Also note that it depends on the used assembler. MASM treated `DB "abcdefgh"` and `DQ "abcdefgh"` differently but newer assemblers treat **character constants** and **strings** with the same endianess. See [Character constants](https://euroassembler.eu/eadoc/#CharNumbers). – vitsoft Jun 07 '21 at 11:18

0 Answers0