There's a lot of factors at play, and I'm not capable of iterating most of them nor understand them. But here's a little glimpse of some of the things happening in the background:
Virtual Memory
On most modern user systems you don't actually have direct access to RAM. You have multiple layers of indirection, one of them being Virtual Memory. VM is memory that your process accesses as if it was normal contiguous RAM, but which actually the underlying systems convert to the proper address in RAM. So accessing the physical memory with a virtual address is almost certainly not gonna provide you with the data you were looking for.
Virtual Memory also has layers. Modern processors include native support for Virtual Memory and it is often controller by a MMU near or on the same die as the processor.
A lot of OSs also have their own layer of virtual memory, that they then either translate to the MMU managed virtual memory on the processor or directly to physical RAM.
Just an example of how far the rabbit hole goes, Linux actually has lazy memory allocation. So when you first allocate memory it is not communicated to the CPU, but only kept saved in a kernel data structure, when you later-on access the memory, the CPU generates a Page Fault. The kernel's page fault handler then looks to see whether that memory was lazily allocated, and if so actually allocates it.
Kernel Space vs User Space
Userspace programs aren't allowed to modify physical memory directly, and in the case of *nixes they call System Calls to do that for them. System calls change the operating mode of the CPU, and is often a relatively slow operation.
Library Functions
Library functions like malloc
have to actually do a lot of bookkeeping to make sure that when you call free
on a pointer, you only free that part. But they also allocate in bulks. malloc
on *nixes calls the syscall mmap
to allocate a page. Subsequent malloc
calls will continue to use that page, until you need more.
How does this relate to this question?
The above is only a glimpse of the things happening when you're working with memory, and so how you allocate the memory, in what quantity and what flags do you pass to the system change a lot of things, and can explain the discrepancies between the results.
Suggestion
Try running strace
on those processes to see where they spend most of their time!