How are the sizes of pointers determined in computer systems? Via virtual or physical addresses?

Question

I have an exam tomorrow on virtual memory address translation and I'm rather confused on this topic. I know the CPU will generate a virtual address to then access a physical address. So if we have a system with 32 bit virtual addresses, and 64 bit physical addresses, then the pointers for user level processes I'm guessing will be 8 bytes. My logic is because the virtual address is being translated to the physical address, so this number will always be coming from the physical address.

score 2 · Answer 1 · answered Oct 23 '19 at 22:46

No, user-space processes work only with virtual addresses (32-bit in your example).

The memory they "see" is their own private virtual address space. (They can make system calls like mmap and munmap to request that pages in that address-space be backed by files, or by anonymous RAM like for C malloc.) But they don't know anything about where in physical memory those pages are located.

The OS can even "fake it" by paging out some of their pages to swap space / page file, and handling the page fault if the process touches such a page by doing I/O to bring it back in and then waking up the process to rerun the load or store instruction that page faulted.

Hardware translates virtual addresses to physical addresses on every memory access. To make this fast, a TLB caches recently-used translations. On a TLB miss, hardware does a "page walk", reading the page tables to find the right virtual page->physical page translation.

The OS manages the page tables, choosing any physical page as "backing" for a virtual page.

Physical addresses wider than virtual?

Under a multi-tasking OS, multiple processes can be running. Each one has its own 32-bit (4GiB) virtual address space.

The size of physical address space limits how much RAM you can put in a machine total, and can be different from how much any single process can use at once. Changing page tables is faster than reading from disk, so even if it can't all be mapped at once, a kernel can still make use of lots of physical RAM for pagecache (cache of file contents from disk).

More importantly, multiple processes can be running, each with their own up-to-4GiB of virtual address space backed by physical memory, up to the amount of physical RAM in the system. On a CPU with multiple cores, these can be running simultaneously, truly allowing simultaneous use of more than 4GB of RAM. But not by any single process.

x86 is a good example here: Running an x86-64 kernel with 32-bit user-space gives us pretty much the situation you describe. (A 64-bit kernel can use 64-bit virtual addresses, but nevermind that, just look at user-space.)

You can have several processes each using about 4GiB of physical RAM.

The x86-64 page-table format has room for physical addresses as wide as 52-bit, although current HW doesn't use that many. (Only as wide as the amount of RAM it actually supports attaching. Saves bits in the TLBs, and other parts of the CPU). https://en.wikipedia.org/wiki/X86-64#Architectural_features

Before x86-64, 32-bit x86 kernels could use the same page-table format but with 36-bit physical addresses, on CPUs from Pentium Pro and later. https://en.wikipedia.org/wiki/Physical_Address_Extension. That allowed up to 64GB of physical RAM. (A 32-bit kernel would typically reserve 1 or 2GB of virtual address space for itself so each process could really only use up to 3 or 2GB, but it's the same idea. Not a problem for 32-bit user-space under a 64-bit kernel though, so that made a simpler example.)

score 0 · Answer 2 · answered Oct 23 '19 at 20:39

Virtual addresses are visible to user-level processes. They never should never see the physical address. So if virtual addresses are 32-bit, pointers in user-level processes are also 32-bit, i.e. 4 bytes.

The system/kernel then needs to do the translation somehow. It will know the virtual address and must translate it to the physical address, so it will eventually have a physical pointer, 64-bit = 8 byte. But once again, this address/pointer are for "internal use" only.

In practice though, you will have virtual and physical addresses of the same size, matching the word size of the CPU and its architecture (x86 vs x86_64). A virtual to pyhsicial translation will normally need to happen in a page fault, which happens when a user-level process attempts to access memory that is not loaded. To access it in the first place, it needs to have e.g. dereferenced a pointer pointing to that address, which would be done with a memory access instruction of the particular CPU architecture, which is done with word-sized addresses.

Actually 32-bit x86 with PAE (36-bit physical addresses) is a great example here. https://en.wikipedia.org/wiki/Physical_Address_Extension. The new page-table format had more room for physical address bits, each 32-bit process was limited to 4GB (or 2 or 3 under a high-half kernel) of virtual address space, but *multiple* such processes could be running at once, each using separate physical memory. — Peter Cordes, Oct 23 '19 at 22:22
Also no, the kernel doesn't manually *translate*; on a page fault it just sets up the page tables for HW page walking. (Except on some systems like MIPS with software TLB management). The kernel manages allocation of physical pages, so it needs to know physical addresses (or at least page numbers) for that, but it fills them into structures in memory. — Peter Cordes, Oct 23 '19 at 22:25

Moosa Mahsoom · Answer 3 · 2019-10-23T22:38:17.360

The programmer will only see virtual addresses. The physical address space is opaque to the programmer and the user. Therefore, size of a pointer is dependent on the size of the virtual address. In the particular case you have given, the maximum amount of memory your system can consume is dictated by your virtual address space. This is why 32-bit OS on 64-bit hardware is limited to a maximum of 4 gigs of memory. But, in the case of a 64-bit virtual address, even if we have insufficient RAM, we can offload some of the pages to the secondary storage to give the illusion that we have more RAM available. In the case, the page is located in the secondary memory, a page fault occurs and the page is transferred to RAM.

Edit : As Peter said in the comments, the virtual address limit affects the maximum memory a Process can consume.

The virtual address space is the limit for one *process*, not the whole system. on a 32-bit x86 kernel with PAE for example, each process can be using up to 4GB of virtual address space, each backed by separate physical pages (up to 64GB total, 36-bit virtual addresses) — Peter Cordes, Oct 23 '19 at 22:27

How are the sizes of pointers determined in computer systems? Via virtual or physical addresses?

3 Answers3

Physical addresses wider than virtual?