Yes, you are paying the price for that extra check. It's not just for pointer indirection, but any memory access (other than, say, DMA). However, the cost of the check is very small.
While your process is running, the page table does not change very often. Parts of the page table will be cached in the translation lookaside buffer, accessing pages with entries in the buffer incur no additional penalty.
If your process accesses a page without a TLB entry, then the CPU must make an additional memory access to fetch the page table entry for that page. It will then be cached.
You can see the effect of this in action by writing a test program. Give your test program a big chunk of memory and start randomly reading and writing locations in memory. Use a command line parameter to change the size.
- Above the L1 cache size, performance will drop due to L2 cache latency.
- Above the L2 cache size, performance will drop to RAM latency.
- Above the size of the memory addressed by the TLB, performance will drop due to TLB misses. (This might happen before or after you run out of L2 cache space, depending on a number of factors.)
- Above the size of available RAM, performance will drop due to swapping.
- Above the size of available swap space and RAM, the application will be terminated by the OS.
If your operating system allows "big pages", the TLB might be able to cover a very large address space indeed. Perhaps you can sabotage the OS by allocating 4k chunks from mmap
, in which case the TLB misses might be felt with only a few megs of working set, depending on your processor.
However: The small performance drop must be weighed against the benefits of virtual memory, which are too numerous to list here.