Why do we need address virtualization in an operating system?

Question

I am currently taking a course in Operating Systems and I came across address virtualization. I will give a brief about what I know and follow that with my question.

Basically, the CPU(modern microprocessors) generates virtual addresses and then an MMU(memory management unit) takes care of translating those virtual address to their corresponding physical addresses in the RAM. The example that was given by the professor is there is a need for virtualization because say for example: You compile a C program. You run it. And then you compile another C program. You try to run it but the resident running program in memory prevents loading a newer program even when space is available.

From my understanding, I think having no virtualization, if the compiler generates two physical addresses that are the same, the second won't run because it thinks there isn't enough space for it. When we virtualize this, as in the CPU generates only virtual addresses, the MMU will deal with this "collision" and find a spot for the other program in RAM.(Our professor gave the example of the MMU being a mapping table, that takes a virtual address and maps it to a physical address). I thought of that idea to be very similar to say resolving collisions in a hash table.

Could I please get some input on my understanding and any further clarification is appreciated.

Process isolation, plain and simple. One process can have zero unintended impact on another process, be it load address collisions, memory corruption, etc. — Jonathon Reinhart, Oct 26 '15 at 02:48

Stephen C · Accepted Answer · 2015-10-26T03:07:49.063

Could I please get some input on my understanding and any further clarification is appreciated.

Your understanding is roughly correct.

Clarifications:

The data structures are nothing like a hash table.
If anything, the data structures are closer to a BTree, but even there are important differences with that as well. It is really closest to a (Java) N-dimensional array which has been sparsely allocated.
It is mapping pages rather than complete virtual / physical addresses. (A complete address is a page address + an offset within the page.).
There is no issue with collision. At any point in time, the virtual -> physical mappings for all users / processes give a one-to-one mapping from (process id + virtual page) to a either a physical RAM page or a disk page (or both).

The reasons we use virtual memory are:

process isolation; i.e. one process can't see or interfere with another processes memory
simplifying application writing; i.e. each process thinks it has a contiguous set off memory addresses, and the same set each time. (To a first approximation ...)
simplifying compilation, linking, loading; i.e. the compilers, etc there is no need to "relocate" code at compile time or run time to take into account other.
to allow the system to accommodate more processes than it has physical RAM for ... though this comes with potential risks and performance penalties.

Thanks. So a process thinks it has all the memory for itself(No restriction). However for the actual underlying practical restriction, the MMU deals with it. The restriction would be I suppose the available space in RAM and the extension to that space that could be done in some secondary storage? — Ralph, Oct 26 '15 at 03:02
Yes, yes, and yes. Also, there is the problem that if lots of processes all attempt to use lots of memory at the same time, you can get "thrashing". — Stephen C, Oct 26 '15 at 03:03

score 0 · Answer 2 · answered Oct 26 '15 at 06:19

I think you have a fundamental misconception about what goes on in an operating system in regard to memory.

(1) You are describing logical memory, not virtual memory. Virtual memory refers to the use of disk storage to simulate memory. Unmapped pages of logical memory get mapped to disk space.

Sadly, the terms logical memory and virtual memory get conflated but they are distinct concepts the the distinction is becoming increasingly important.

(2) Programs run in a PROCESS. A process only runs one program at a time (in unix each process generally only runs one program (two if you count the cloned caller) in its life.

In modern systems each process process gets a logical address space (sequential addresses) that can be mapped to physical locations or no location at all. Generally, part of that logical address space is mapped to a kernel area that is shared by all processes. The logical address space is create with the process. No address space—no process.

In a 32-bit system, addresses 0-7FFFFFFF might be user address that are (generally) mapped to unique physical locations while 80000000-FFFFFFFFmight be mapped to a system address space that is the same for all processes.

(3) Logical memory management primarily serves as a means of security; not as a means for program loading (although it does help in that regard).

(4) This example makes no sense to me:

You compile a C program. You run it. And then you compile another C program. You try to run it but the resident running program in memory prevents loading a newer program even when space is available.

You are ignoring the concept of a PROCESS. A process can only have one program running at a time. In systems that do permit serial running of programs with the same process (e.g., VMS) the executing program prevents loading another program (or the loading of another program causes the termination of the running program). It is not a memory issue.

(5) This is not correct at all:

From my understanding, I think having no virtualization, if the compiler generates two physical addresses that are the same, the second won't run because it thinks there isn't enough space for it. When we virtualize this, as in the CPU generates only virtual addresses, the MMU will deal with this "collision" and find a spot for the other program in RAM.

The MMU does not deal with collisions. The operating system sets up tables that define the logical address space when the process start. Logical memory has nothing to do with hash tables.

When a program accesses logical memory the rough sequence is:

Break down the address into a page and an offset within the page.
Does the page have an corresponding entry in the page table? If not FAULT.
Is the entry in the page table valid? If not FAULT.
Does the page table entry allow the type of access (read/write/execute) requested in the current operating mode (kernel/user/...)? If not FAULT.
Does the entry map to a physical page? If not PAGE FAULT (go load the page from disk--virtual memory––and try again).
Access the physical memory referenced by the page table.

Why do we need address virtualization in an operating system?

2 Answers2