I am trying to collect some block layer information related to writes performed by some target processes on the block devices. Specifically, I want to find: (1) the starting sector that needs to be written to, (2) the number of sectors written and (3) the time of the write.
I wrote a rudimentary bcc-python script to collect the required information by probing the block_rq_complete
event. The script is as follows:
from bcc import BPF
PROG = """
#include <linux/blkdev.h>
#include <linux/sched.h>
struct data_t {
u32 pid;
u64 ts;
u64 sector_start;
u64 nr_sectors;
char comm[TASK_COMM_LEN];
};
BPF_PERF_OUTPUT(events);
TRACEPOINT_PROBE(block, block_rq_complete) {
struct data_t data = {};
data.pid = bpf_get_current_pid_tgid();
data.ts = bpf_ktime_get_ns();
data.sector_start = args->sector;
data.nr_sectors = args->nr_sector;
bpf_get_current_comm(&data.comm, sizeof(data.comm));
events.perf_submit(args, &data, sizeof(data));
return 0;
}
"""
def print_event(cpu, data, size):
event = b["events"].event(data)
print(event.comm, event.ts, event.sector_start, event.nr_sectors)
b = BPF(text=PROG)
b["events"].open_perf_buffer(print_event)
while True:
b.perf_buffer_poll()
When I run this script I find that majority of the calls to block_rq_complete
are actually performed by the swapper. I understand that a lot of these calls are to perhaps write the dirty cache. And that if the process is not explicitly calling fsync
, process's pid might never actually show up in the events and the corresponding writes for the process will be written by the swapper.
Is there a way to capture writes performed specifically for (or on behalf of) my desired process?
And is there a way to sift through the large number of events from swapper? For e.g. events show up even when I am writing stuff to terminal or using clear
on the terminal (surprisingly they take place on the SSD, identified using the major and minor numbers).
I would be extremely grateful for any help. Thank you!