0

Say, for a process, I know it has a page located at virtual address 0x42424242. I want to manually swap it out to the swap area. From user space I think it's hard to do so. I didn't find any related commands to force the kernel to do it. Does that mean I have to hack the kernel and implement this feature?

yuanqili
  • 101
  • 3

2 Answers2

1

Look at https://stackoverflow.com/questions/578137/can-i-tell-linux-not-to-swap-out-a-particular-processes-memory, in particular you may want to check the madvise system call and its advice argument MADV_DONTNEED: http://man7.org/linux/man-pages/man2/madvise.2.html.

Here's a citation from The Linux Programming Interface book, p. 1055:

MADV_DONTNEED The calling process no longer requires the pages in this region to be memory-resident. The precise effect of this flag varies across UNIX implementations. ... Linux:

for a MAP_PRIVATE region, the mapped pages are explicitly discarded, which means that modification to the pages are lost. The Virtual memory address range remains accessible, but the next access of each page will result in a page fault reinitializing the page, either with the contents of the file from which it is mapped or with zeros in the case of anonymous mapping.

for a MAP_SHARED region, the kernel may discard modified pages in some circumstances, depending on the architecture (doesn't occur on x86).

Of course, the kernel can completely ignore this "hint".

Juraj Martinka
  • 495
  • 1
  • 3
  • 8
  • Yes, we want to force swapping out selected pages to implement a custom cache mechanism for our system. As the system requires microsecond-level response time, only giving linux a hint doesn't look enough. – yuanqili Mar 16 '20 at 11:33
  • Best to measure but if you have real-time requirements you may need to look at a different OS/kernel. – Juraj Martinka Mar 16 '20 at 11:39
0

There's an kernel algorithm to decide it, I'll try to exemplify using generic terms.

Even though a file has been mmapped it'll only be read from disk upon access. There's a fair possibility that 0x42424242 was never accessed, depending on the type of data residing there.

Say 0x42424xxx page's been accessed, a pagefault handler should retrieve it from disk and set up kernel structures accordingly, among other things it should set page_last_access to an equivalent of RDSTC's output, and the cpu pagetable's entry for that page 'A' bit is reset (calling pte_mkold(), when Linux).

After given time kernel polls the page table for pages with the bit 'A' set, if so, it updates that page_last_access and reset 'A' back to 0, leaving the cpu to re-enable it upon page's access. Mind here there's a memory write issued by the cpu due to a read/write/execute at other address.

Whenever someone calls malloc() the kernel algorithm will seek the oldest known page and either free or swap it out, depending whether such page's dirty bit flag is, it's reasonable to assume at first these pages will likely be from processes predating your own.

Tl;dr - Mapping a big file then reading just a handful of pages is a regular practice, userspace programs should not worry about it.