33

So, I understand that if you need some dynamically allocated memory, you can use malloc(). For example, your program reads a variable length file into a char[]. You don't know in advance how big to make your array, so you allocate the memory in runtime.

I'm trying to understand when you would use mmap(). I have read the man page and to be honest, I don't understand what the use case is.

Can somebody explain a use case to me in simple terms? Thanks in advance.

Steve Walsh
  • 6,363
  • 12
  • 42
  • 54

4 Answers4

47

mmap can be used for a few things. First, a file-backed mapping. Instead of allocating memory with malloc and reading the file, you map the whole file into memory without explicitly reading it. Now when you read from (or write to) that memory area, the operations act on the file, transparently. Why would you want to do this? It lets you easily process files that are larger than the available memory using the OS-provided paging mechanism. Even for smaller files, mmapping reduces the number of memory copies.

mmap can also be used for an anonymous mapping. This mapping is not backed by a file, and is basically a request for a chunk of memory. If that sounds similar to malloc, you are right. In fact, most implementations of malloc will internally use an anonymous mmap to provide a large memory area.

Another common use case is to have multiple processes map the same file as a shared mapping to obtain a shared memory region. The file doesn't have to be actually written to disk. shm_open is a convenient way to make this happen.

Greg Inozemtsev
  • 4,516
  • 23
  • 26
  • Do you happen to have a link with more details on how to access a large file with only a small amount of memory? This point confuses me about the way mmap() works... – Ioan Aug 23 '12 at 16:14
  • @Ioan There aren't really any details for large files: you just `mmap` them. The file does have to fit into the _virtual_ memory of course, but the entire virtual memory does not need to be present in RAM at any given time. `mmap` uses the same mechanism as swap space. But an explanation of how virutal memory works would be a bit too long for a comment :) – Greg Inozemtsev Aug 23 '12 at 16:51
  • Your comment makes more sense. I misunderstood what you meant about processing files larger than available memory. – Ioan Aug 23 '12 at 17:18
  • @loan typically you map a view (or many views) of a file with a file offset - the offset can be larger than the address space of your OS as long as the size of the view isn't. It's a common way to handle multi-Gb data sets on a 32bit OS. – Martin Beckett Aug 23 '12 at 18:26
  • @MartinBeckett Yes, this too I knew. I misunderstood that you were saying it's possible to edit the entire file larger than addressable memory in one go. – Ioan Aug 24 '12 at 12:12
9

Whenever you need to read/write blocks of data of a fixed size it's much simpler (and faster) to simply map the data file on disk to memory using mmap and acess it directly rather than allocate memory, read the file, access the data, potentially write the data back to disk, and free the memory.

Martin Beckett
  • 94,801
  • 28
  • 188
  • 263
  • Does this mean that `mmap`-ed files stay in memory for ever? Because if not, the process of `mmap` is equivalent to "allocate-read-access-writeback-free" yes? We are just calling it by another name? – Ambareesh Mar 26 '21 at 16:31
0

consider the famous producer-consumer problem, the producer creates a shared memory object using shm_open(), and since our goal is to make the producer and consumer share data, we use the mmap syscall to map that shared memory region to the process' address space. Now, the consumer can open that shared memory object (shared memory objects are referred to by a "name") and read from it, after a call to mmap to map the address space as done for the producer.

0

typical use: a network kernel driver which need to make incoming data available to User space part. Then the "shared buffer" is mapped between both parts, makes the whole data exchange operation very natural. Of course, you need to add some management layer on top of it + probably some signalling