I have a weird problem with some kernel code I have written. I can't share the exact code, but I can give the general idea of what's going on.
I'm work on a project (windows) which modifies the page tables of a process in order to modify a function in memory, via changing the PFN in the PTE to another physical page with different contents. I am doing this in order to hook a function.
Once the hooked function is called, it does some processing that looks like this:
void HookFunctionViaPTE()
{
// get a pointer to the PTE for the function "MyRoutine"
PPTE pte = RetrievePTE(&MyRoutine);
pte->PFN = g_HijackedCodePfn;
// g_HijackedCodePfn is the PFN of an allocated page in memory containing a copy of the page which "MyRoutine" lies in. Overwrite it with a jump to "MyRoutine_Hook"
memcpy(g_HijackedCodePtr + VirtualAddress.PFNOffset, hookCode, sizeof(hookCode));
}
void MyRoutine_Hook(
PVOID context
)
{
// some work here
// call original version of this function
// setup PTE to point to old physical page
RestoreOriginalPFNInPTE();
__writecr3(__readcr3());
// this should call into the original code and not into this hook recursively
MyRoutine();
// go back to hacked context
RestoreHackedPFNInPTE();
__writecr3(__readcr3());
// other work here
}
Essentially, within the function hook I modify the page tables so the original data is pointed to again in RAM so when I call the function recursively it calls the original instead of going back into the hook again.
Slight problem though -- everything works perfectly when stepping through each line with a debugger. However, when letting the code run freely it seems as if the CPU forgets that I have changed the page tables, when MyRoutine
is called in the hook, it calls the hook again. I've tried pretty much everything to fix it, including invalidating the paging entries, flushing the entire TLB, and even recreating the paging structures in a separate physical page and then setting cr3
to that. But nothing really fixes the problem.
I had some success using __wbinvd
, but the behavior was strange. I had to place it right before the function call to make any noticeable difference, but even then it didn't work.
While an exact solution isn't clear from the lack of source, can someone explain possible conditions that cause the CPU to act like this? Or what I can do to help.