0

I'm playing around with memory as a sort of academic exercise and trying to write values to the memory much like I'd do in assembly, basically by giving it a size (1,2,4,8) and a value and a memory address, something like:

movb $7,   -1(%rbp)      size=1; value=7,   address=-1 (pretend rbp=0)
movw $777, -4(%rbp)      size=2; value=777, address=-4

With doing this sort of stuff in C, do I need to create multiple pointers with different sizes to do these sort of random-sized writes? For example, current I'm doing it like this:

char* play_mem_block(size_t bytes)
{
    char *mem_block = malloc(bytes);
    if (!mem_block) {
        perror("Looks like there was an error");
        return NULL;
    }

    // write a one-byte char value at address0
    *mem_block = 'a';

    // write a four-byte (int) value
    int *mem_block_for_int = (int*) mem_block;
    *(mem_block_for_int + 1) = 4444;

    /* *(mem_block+3) = 4444; */     <-- possible to do this directly?

    return mem_block;

}
Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
samuelbrody1249
  • 4,379
  • 1
  • 15
  • 58
  • 1
    Asm like that is how C structs work; casting to a struct type with the layout you want is one option. Beware of strict-aliasing rules in C if you want to do overlapping accesses with different types; use `memcpy` or in GNU C `typedef uint32_t unaligned_aliasing_u32 __attribute__((aligned(1),may_alias));` – Peter Cordes Feb 13 '21 at 04:48
  • @PeterCordes I see, what if just something random-access like : `*(mem_block+77) = 1234;` is that possible to do directly in C? – samuelbrody1249 Feb 13 '21 at 04:53
  • `mem_block` has type `char*`, and `1234` doesn't fit in one char. What access width were you hoping that to have? `*(mem_block+77) = ...` would require the right hand side to implicitly convert to `char`. You can of course cast the pointer type (but that would violate `alignfo(int)` if you meant `int`) or use memcpy. – Peter Cordes Feb 13 '21 at 04:55
  • 1
    If you're bashing around at a low level you need to be aware of the type of value you want to write and how it may need to be converted. If `1234` is supposed to be an `int32_t` then you need to store it as one, then write `sizeof(int)` to that address using `memcpy()`. – tadman Feb 13 '21 at 04:58

3 Answers3

3

Here's an adaptation that is closer to your goal:

char* play_mem_block(size_t bytes)
{
    char *mem_block = malloc(bytes);
    if (!mem_block) {
        perror("Looks like there was an error");
        return NULL;
    }

    // write a one-byte char value at address 0
    char a = 'a';
    memcpy(mem_block, a, sizeof(a));

    // write a four-byte (int) value
    int ffffour = 4444;
    memcpy(mem_block + 1, &ffffour, sizeof(ffffour));

    return mem_block;
}

In practice you'll want to create a "buffer" handler of some kind for allocation, and then helper methods like this:

void* write_int32_t(void* b, int32_t n) {
  memcpy(b, &n, sizeof(n));

  return b + sizeof(n);
}

Where you can stamp out variants of that for every type you need to write out. You could also write an inverse read_int32_t method:

void* read_int32_t(void* b, int32_t* n) {
  memcpy(n, b, sizeof(n));

  return b + sizeof(n);
}

Which is just as trivial.

tadman
  • 208,517
  • 23
  • 234
  • 262
  • Can you pick a var name other than `ffff`? It looks like it's supposed to be a hex constant :/ But yes, exactly, wrapper functions for memcpy of different sizes solves the alignment and strict-aliasing problems, and will inline to single store instructions on ISAs (like x86-64) that allow unaligned stores. – Peter Cordes Feb 13 '21 at 05:02
  • @PeterCordes For you, `ffffour`. – tadman Feb 13 '21 at 05:05
  • uh, ok I'll take it :P – Peter Cordes Feb 13 '21 at 05:06
2

You can do:

*(int *)(mem_block + 4) = 4444;    // only safe for aligned offsets

or

((int *)mem_block)[1] = 4444;

In order to write as int you need an expression of type int to write through, hence the cast.

Be careful about alignment, some systems have alignment requirements, trying to write an int at offset of 3 bytes may be undefined depending on platform. To do unaligned writes safely you would need to use the memcpy version.

Your original approach is fine and arguably easier to read , so don't be in a hurry to do something different for the sake of it.

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
M.M
  • 138,810
  • 21
  • 208
  • 365
2

Asm like that is how C structs work; casting to a struct foo* with the layout you want is one option.

Beware of strict-aliasing and alignment rules in C if you want to do overlapping accesses with different types. C is not portable assembly language; you have to obey C's rules if you don't want a compiler to "break" your code at compile time. (Or from another PoV, for your code not to mean what you want and not be already broken.) e.g. Why does unaligned access to mmap'ed memory sometimes segfault on AMD64? shows that alignment-related segfaults are possible after compiler optimization even on an ISA like x86-64 where normal scalar unaligned loads/stores never fault.

Use memcpy(ptr+3, &u32value, sizeof(u32value)); for an unaligned 4-byte store that you can safely reload with a different type. (For a var type like uint32_t u32value = 1234; - uin32_t is required to be exactly 32 bits wide with no padding, if the system provides it.)

With a decent modern compiler targeting an ISA (like x86-64) where unaligned loads are cheap, that memcpy will fairly reliably inline and optimize to a single mov store with the right operand size.

*(uint32_t)(ptr+3) = 1234 would not be safe, because alignof(uint32_t) is 4 on most C implementations (ones with 8-bit char). Since malloc itself returns memory sufficiently aligned to store any type (up to max_align_t), ptr+3 is definitely misaligned (except on weird systems with wide char where sizeof(uint32_t)=1, or where it chooses not to require alignment of wider types).


Or in GNU C we can use type attributes to tell the compiler about under-aligned versions of types, and/or give them the same alias-anything ability as char*.

(__m128i and other Intel vector intrinsic types use the may_alias attribute; that's how compilers make it safe to _mm_load_ps( (float*)ptr_to_int ) and stuff like that.)

typedef uint32_t unaligned_aliasing_u32 __attribute__((aligned(1),may_alias));


char *ptr = malloc(1234);

*(unaligned_aliasing_u32)(ptr+3) = 5678;   // movl $5678, 3(%rax)
*(unaligned_aliasing_u32)(ptr+5) = 5678;   // movl $5678, 5(%rax) overlapping 
int tmp = ((const uint32_t*)ptr)[1];       // mov 4(%rax), %edx   aligned so I used plain uint32 for example.

See part of my answer on Why does glibc's strlen need to be so complicated to run quickly? for another example of using a may_alias load to safely read a buffer as longs, unlike glibc's version which is only "safe" because it's compiled with known compilers and without the possibility of inlining into a caller that loads or stores the same memory with a different type. (That one does need to be aligned so I omitted the aligned(1) attribute.)

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847