That premise seems a bit weird to me, I don't really get why you'd have the stacks file-mapped after such a CRIU operation... but anyway:
First off: There is one type of file mapping that userfaultfd does work with, which is shmem/tmpfs. But I don't know whether that helps in your case. If not:
You can't register the file mapping with userfaultfd, but you can register the new anonymous mapping with userfaultfd. This means that one thing you could do would be to first replace the stack with the new mapping, then copy the data over from the file when you know the old mapping is no longer used.
You probably don't want to do exactly this, because then you'd have to block for as long as it takes to copy the entire stack. There are two optimizations you could consider:
- You could try to stop the thread and figure out the thread's current stack pointer; any memory that is sufficiently far below the stack pointer based on the ABI (e.g. 128 bytes on amd64) doesn't need to be copied at all, you only have to register the currently used part of the stack with userfaultfd. (Probably a good way to do this would be to send a signal to the thread and let the signal handler take care of this.)
If your threads typically have relatively little stack usage and only use lots of stack memory for short moments, this is probably all you need?
- You could copy the file contents into anonymous memory area A ahead of time while letting the kernel monitor which of the file mapping pages have been written to. Then after you replace the file mapping with a new anonymous mapping B with userfaultfd, you can ask the kernel which parts of the file mapping have been written to, copy all those parts into mapping A again, and then
mremap()
mapping A over the file mapping. This probably only makes sense if your stacks are typically pretty big. To figure out which parts of a file mapping have changed, you can use the kernel's Soft-Dirty interface, using bit 55 in /proc/[pid]/pagemap
and /proc/[pid]/clear_refs
.