Trace page table access of a Linux process

Question

I am writing to inquire the feasibility of tracing the page table access (in terms of "index" of each page table access) of a common Linux user application. Basically, what I am doing is to re-produce the exploitation mentioned in this research article (https://www.ieee-security.org/TC/SP2015/papers-archived/6949a640.pdf). In particular, the data-page accesses need to be recorded for usage and inference of program secrets.

I understand the on Linux system, 64-bit x86 architecture, the page table size is 4K. And i have used pin (https://software.intel.com/en-us/articles/pin-a-dynamic-binary-instrumentation-tool) to log a trace of addresses for all virtual memory access. So can I simply calculate the "index" of each data page table access, with the following translation rule?

index = address >> 15

Since 4KB = 2 ^ 15. Is it correct? Thank you in advance for any suggestions or comments.

Also, I think one thing I want to point out is that conceptually, I don't need a "precise" identifier of each data page table ID, but just a number ("index") to distinguish the access of different data pages. This shall provide conceptually identical amount of information compared with their attacks.

What is an "index" of page table access? Is this an index of a row in the table, which is used for address translation? Note, that x86 and x86_64 has **several** levels of page tables. So at a single address translation several tables are accessed, so it is unclear what your "index" represents. — Tsyvarev, May 05 '20 at 07:39
What's `pin`? What do you mean by "index"? What are you *exactly* trying to achieve? Are you writing a kernel module? Are you working with an userspace program? Please clarify. Saying "page table index" alone does not mean much as Linux has several levels of page tables and each address corresponds to an index in each one of them. — Marco Bonelli, May 05 '20 at 13:18
@Tsyvarev Sorry for the confusion it has caused.. I updated my question with more information and reference. Could you please take a look and see if that makes more sense this time? Thank you! — lllllllllllll, May 05 '20 at 14:32
@MarcoBonelli. Thank you! I updated my question with additional info. Could you kindly take a look and see if that makes sense? Thank you! — lllllllllllll, May 05 '20 at 14:32
The simplest way to do this would be to instument qemu to record memory accesses. — stark, May 05 '20 at 14:55
Keep in mind that pin performs memory accesses of its own and may change the place where memory allocations are made by the program — nitzanms, May 06 '20 at 18:05

Marco Bonelli · Accepted Answer · 2020-05-05T14:55:19.493

1

Ok, so you don't really need an "index", but just some unique identifier to distinguish different pages in the virtual address of a process.

In such case, then you can just do address >> PAGE_SHIFT. In x86 with 4KB pages PAGE_SHIFT is 12, so you can do:

page_id = address >> 12

Then if address1 and address2 correspond to the same page the page_id will be the same for both addresses.

Alternatively, to achieve the same result, you could do address & PAGE_MASK, where PAGE_MASK is just 0xfffffffffffff000 (that is ~((1UL << PAGE_SHIFT) - 1)).

edited May 05 '20 at 14:55

answered May 05 '20 at 14:49

Marco Bonelli

63,369
21
118
128

Fantastic. This is exactly what I am looking for. Thank you very much! – lllllllllllll May 05 '20 at 15:44

Trace page table access of a Linux process

1 Answers1