I have a big shared memory allocated (4 MB) between two processes. My process-1 writes on this memory pool, using it as a circular buffer, by writing in chunks of 256 bytes, one after the other. My process-2 reads from the memory. I am using locks for synchronization. I was measuring the write time, and I could see spikes at every 16th operation. My guess is that is because of accessing a new page (since 16*256 bytes = 4096 bytes).
Since this is happening at a critical point in my program and results in high latency, I decided to warm-up my page table of process-1 by accessing this mempool just after the constructor allocates/binds to the shared memory.
//global var
dummy_byte = 0;
for (int i=0; i<16*1024; i++)
{
dummy_byte ^= buffer[i*256];
}
The objective is to access a byte from each of the chunks, so that all the pages are fetched. I'm using a global variable for this, otherwise the compiler optimises and removes my loop (because noone reads the saved value). I verified later using objdump that this code doesn't get removed.
The problem I'm facing is that the latency spikes are still happening. While playing with the warmup logic, I tried this:
for (int i=0; i<16*1024; i++)
{
buffer[i*256] = 0;
}
I found out that this results in no latency spikes in the critical point. The problem is that I do not want to write junk into the buffer, since there might be something useful present in there, and doing a read-write of the same byte my result in race condition because another process might be reading from the shared memory.
I want to know:
- is it indeed happening because of tlb? or something else?
- if it is tlb, why is it that read is unable to warmup the pages, but write is able to?
- is there anything more that can be experimented with?