I believe you are a bit wrong in your understanding what mlock
does. It's intended usage is for:
- Assert that there will be no waits on reading from the memory due to data was not loaded from disk yet or swapped out (useful for performance reasons and crucial in real-time applications).
- Assert that the pages won't be swapped out (crucial for private data such as passwords or private keys in clear-text).
So it asserts that the pages are loaded into RAM and prevents them from being swapped out. There are no guaranties that it prevents write-back of dirty pages mapped from a file (and it actually doesn't, see the experiment bellow).
To hint the kernel that you are going to make some reads from an fd soon there is posix_fadvise()
, so
posix_fadvise(fd, offset, len, POSIX_FADV_RANDOM);
will probably load the requested part of the file to the page cache.
I can't claim that for sure, but I suppose that there is actually no way to forbid writing back the dirty pages for a specific file as for now. There might be some way to hint it, but I don't see any either.
An experiment with mmap/mlock
alexander@goblin ~/tmp $ cat mmap.c
#include <sys/mman.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
#include <unistd.h>
#define handle_error(msg) \
do { perror(msg); exit(EXIT_FAILURE); } while (0)
int main(int argc, char *argv[]) {
char *addr;
int fd;
struct stat sb;
size_t length;
if (argc != 2) {
fprintf(stderr, "%s file\n", argv[0]);
exit(EXIT_FAILURE);
}
fd = open(argv[1], O_RDWR);
if (fd == -1) {
handle_error("open");
}
if (fstat(fd, &sb) == -1) { /* To obtain file size */
handle_error("fstat");
}
length = sb.st_size;
addr = mmap(NULL, length , PROT_READ|PROT_WRITE, MAP_SHARED, fd, 0);
if (addr == MAP_FAILED) {
handle_error("mmap");
}
if(mlock(addr, length)<0) {
handle_error("mlock");
}
strcpy(addr, "hello world!");
sleep(100);
munmap(addr, length);
close(fd);
exit(EXIT_SUCCESS);
}
alexander@goblin ~/tmp $ grep . /proc/sys/vm/dirty_{expire,writeback}_centisecs
/proc/sys/vm/dirty_expire_centisecs:1000
/proc/sys/vm/dirty_writeback_centisecs:500
alexander@goblin ~/tmp $ dd if=/dev/zero of=foo bs=4k count=1
1+0 records in
1+0 records out
4096 bytes (4.1 kB, 4.0 KiB) copied, 8.1296e-05 s, 50.4 MB/s
alexander@goblin ~/tmp $ fallocate -l 4096 foo
alexander@goblin ~/tmp $ sync foo
alexander@goblin ~/tmp $ sudo hdparm --fibmap foo
foo:
filesystem blocksize 4096, begins at LBA 0; assuming 512 byte sectors.
byte_offset begin_LBA end_LBA sectors
0 279061632 279061639 8
alexander@goblin ~/tmp $ sudo dd if=/dev/mapper/vg_main-gentoo_home count=8 skip=279061632 iflag=nocache 2>/dev/null | hexdump -C
00000000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
*
00001000
alexander@goblin ~/tmp $ gcc mmap.c
alexander@goblin ~/tmp $ ./a.out foo &
[1] 26450
alexander@goblin ~/tmp $ sudo hdparm --fibmap foo
foo:
filesystem blocksize 4096, begins at LBA 0; assuming 512 byte sectors.
byte_offset begin_LBA end_LBA sectors
0 279061632 279061639 8
alexander@goblin ~/tmp $ sudo dd if=/dev/mapper/vg_main-gentoo_home count=8 skip=279061632 iflag=nocache 2>/dev/null | hexdump -C
00000000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
*
00001000
alexander@goblin ~/tmp $ sleep 10
alexander@goblin ~/tmp $ sudo hdparm --fibmap foo
foo:
filesystem blocksize 4096, begins at LBA 0; assuming 512 byte sectors.
byte_offset begin_LBA end_LBA sectors
0 279061632 279061639 8
alexander@goblin ~/tmp $ sudo dd if=/dev/mapper/vg_main-gentoo_home count=8 skip=279061632 iflag=nocache 2>/dev/null | hexdump -C
00000000 68 65 6c 6c 6f 20 77 6f 72 6c 64 21 00 00 00 00 |hello world!....|
00000010 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
*
00001000
alexander@goblin ~/tmp $ fg
./a.out foo
^C