The reason why I ask this question is that, when testing the behavior of the Linux soft-dirty bit, I found that if I create a thread without touching any memory, the soft-dirty bit of all pages will be set to 1 (dirty).
For example, malloc(100MB)
in the main thread, then clean soft dirty bits, then create a thread that just sleeps. After the thread is created, the soft-dirty bit of all that 100MB memory chunk is set to 1.
Here is the test program I'm using:
#include <thread>
#include <iostream>
#include <vector>
#include <cstdint>
#include <stdlib.h>
#include <fcntl.h>
#include <unistd.h>
#include <string.h>
#include <sys/types.h>
#define PAGE_SIZE_4K 0x1000
int GetDirtyBit(uint64_t vaddr) {
int fd = open("/proc/self/pagemap", O_RDONLY);
if (fd < 0) {
perror("Failed open pagemap");
exit(1);
}
off_t offset = vaddr / 4096 * 8;
if (lseek(fd, offset, SEEK_SET) < 0) {
perror("Failed lseek pagemap");
exit(1);
}
uint64_t pfn = 0;
if (read(fd, &pfn, sizeof(pfn)) != sizeof(pfn)) {
perror("Failed read pagemap");
sleep(1000);
exit(1);
}
close(fd);
return pfn & (1UL << 55) ? 1 : 0;
}
void CleanSoftDirty() {
int fd = open("/proc/self/clear_refs", O_RDWR);
if (fd < 0) {
perror("Failed open clear_refs");
exit(1);
}
char cmd[] = "4";
if (write(fd, cmd, sizeof(cmd)) != sizeof(cmd)) {
perror("Failed write clear_refs");
exit(1);
}
close(fd);
}
int demo(int argc, char *argv[]) {
int x = 1;
// 100 MB
uint64_t size = 1024UL * 1024UL * 100;
void *ptr = malloc(size);
for (uint64_t s = 0; s < size; s += PAGE_SIZE_4K) {
// populate pages
memset(ptr + s, x, PAGE_SIZE_4K);
}
char *cptr = reinterpret_cast<char *>(ptr);
printf("Soft dirty after malloc: %ld, (50MB offset)%ld\n",
GetDirtyBit(reinterpret_cast<uint64_t>(cptr)),
GetDirtyBit(reinterpret_cast<uint64_t>(cptr + 50 * 1024 * 1024)));
printf("ALLOCATE FINISHED\n");
std::string line;
std::vector<std::thread> threads;
while (true) {
sleep(2);
// Set soft dirty of all pages to 0.
CleanSoftDirty();
char *cptr = reinterpret_cast<char *>(ptr);
printf("Soft dirty after reset: %ld, (50MB offset)%ld\n",
GetDirtyBit(reinterpret_cast<uint64_t>(cptr)),
GetDirtyBit(reinterpret_cast<uint64_t>(cptr + 50 * 1024 * 1024)));
// Create thread.
threads.push_back(std::thread([]() { while(true) sleep(1); }));
sleep(2);
printf("Soft dirty after create thread: %ld, (50MB offset)%ld\n",
GetDirtyBit(reinterpret_cast<uint64_t>(cptr)),
GetDirtyBit(reinterpret_cast<uint64_t>(cptr + 50 * 1024 * 1024)));
// memset the first 20MB
memset(cptr, x++, 1024UL * 1024UL * 20);
printf("Soft dirty after memset: %ld, (50MB offset)%ld\n",
GetDirtyBit(reinterpret_cast<uint64_t>(cptr)),
GetDirtyBit(reinterpret_cast<uint64_t>(cptr + 50 * 1024 * 1024)));
}
return 0;
}
int main(int argc, char *argv[]) {
std::string last_arg = argv[argc - 1];
printf("PID: %d\n", getpid());
return demo(argc, argv);
}
I print the dirty bit of the first page, and the page at offset 50 * 1024 * 1024
. Here is what happens:
- The soft-dirty bits after
malloc()
are 1, which is expected. - After clean soft-dirty, they become 0.
- Create a thread that just sleeps.
- Check dirty bit, all pages in the 100MB region (I didn't print dirty bits of all pages, but I did the check on my own) now have the soft-dirty bit set to 1.
- Restart the loop, now the behavior is correct, soft-dirty bits remain 0 after creating additional threads.
- The soft-dirty bit of the page at offset 0 is 1 since I did
memset()
, and the soft-dirty bit of page50 MB
remains 0.
Here is the output:
Soft dirty after malloc: 1, (50MB offset)1
ALLOCATE FINISHED
Soft dirty after reset: 0, (50MB offset)0
Soft dirty after create thread: 1, (50MB offset)1
Soft dirty after memset: 1, (50MB offset)1
(steps 1-4 above)
(step 5 starts below)
Soft dirty after reset: 0, (50MB offset)0
Soft dirty after create thread: 0, (50MB offset)0
Soft dirty after memset: 1, (50MB offset)0
Soft dirty after reset: 0, (50MB offset)0
Soft dirty after create thread: 0, (50MB offset)0
Soft dirty after memset: 1, (50MB offset)0
Soft dirty after reset: 0, (50MB offset)0
Soft dirty after create thread: 0, (50MB offset)0
Soft dirty after memset: 1, (50MB offset)0
I thought thread creation would just mark the pages as being in a "shared" state, not modify them, so the soft-dirty bit should remain unchanged. Apparently, the behavior is different. Therefore I'm thinking: does creating a thread trigger page faults on all of the pages? So the OS sets all pages' soft-dirty bit to 1 when handling the page fault.
If this is not the case, why does creating a thread make all memory pages of the process become "dirty"? Why does only the first thread creation have such behavior?
I hope I explained the question well, please let me know if more details are needed, or if anything doesn't make sense.