0

I want to write a C program that triggers execution of a bpf program when a syscall is executed on a specific CPU by any process/thread

So the idea is to do a perf_event_open(pattr, -1, {MY_CPU_NUM}, -1, 0) followed by ret = ioctl(efd, PERF_EVENT_IOC_SET_BPF, prog_fd);. My BPF program increments a counter in a map, that I am reading.

The specific system call I am using in my example is sys_exit_unlinkat, and I am testing the program by command taskset --cpu-list {ANY_CPU_OTHER_THAN_MY_CPU_NUMBER} rm -rf {DIRECTORY}}.

I expect that if I command to remove directory from a different core than where I placed my perf event, I should not see my counter increment. However, I see my counter increment irrespective of the cpu argument I provide in perf_event_open.

I dont understand why!

I tried, seeing what does perf record -C XX do, and it shows up bunch of perf_event_open along with one perf_event_open with PERF_TYPE_TRACEPOINT with similar arguments as mine, and it works correctly that it shows output only when rm -rf is executed on the MY_CPU_NUM.

Code Snippet:

    pattr.type = PERF_TYPE_TRACEPOINT;
    pattr.size = sizeof(pattr);
    pattr.config =721; //unlinkat // 723; // rmdir                                                                      
    pattr.sample_period = 1;
    pattr.wakeup_events = 1;
    pattr.disabled = 1;
    pattr.exclude_guest = 1;
    pattr.sample_type = PERF_SAMPLE_RAW;
    efd = perf_event_open(&pattr, -1, 0, -1, 0); // cpu number is zero                                                                     
    if(efd < 0) {
        printf("error in efd opening, %s\n", strerror(errno));
        exit(1);
    } 
    ret = ioctl(efd, PERF_EVENT_IOC_SET_BPF, prog_fd);
    if (ret < 0) {
        printf("PERF_EVENT_IOC_SET_BPF error: %s\n", strerror(errno));
        exit(-1);
    }                                                                                                             
    ret = ioctl(efd, PERF_EVENT_IOC_ENABLE, 0);
    if (ret < 0) {
        printf("PERF_EVENT_IOC_ENABLE error: %s\n", strerror(errno));
        exit(-1);
    }

output of uname -a Linux zephyr 5.4.0-110-generic in my machine.

EDIT-1:

Okay, I tried some noob debugging by putting the kernel into gdb and trying to figure out the issue.

So, in the syscall_exit path perf_syscall_exit(kernel/events/trace_syscalls.c) is called, which then looks if there is some perf event associated with the current cpu.

code snippet:

static void perf_syscall_exit(void *ignore, struct pt_regs *regs, long ret)
{
...
        syscall_nr = trace_get_syscall_nr(current, regs);
        if (syscall_nr < 0 || syscall_nr >= NR_syscalls)
                return;
        if (!test_bit(syscall_nr, enabled_perf_exit_syscalls))
                return;

        sys_data = syscall_nr_to_meta(syscall_nr);
        if (!sys_data)
                return;

        head = this_cpu_ptr(sys_data->exit_event->perf_events);
        valid_prog_array = bpf_prog_array_valid(sys_data->exit_event);
        if (!valid_prog_array && hlist_empty(head)) // <--- WATCH
                return;

...

Now, in the above code, see where I commented WATCH. So what it checks I think is, that if the program is invalid and the event list is empty, return. So, imagine if the program is valid yet the event list is empty, then irrespective whether cpu has an event attached or not, this check will not pass and we will go ahead exeucting the BPF program.

So, I checked by installing perf_event without attaching bpf program and I saw that the check passed and we did not go ahead when the rm -rf {DIRECTORY} was executed from a different cpu. And when I executed from the core 0(where event was attached), the check failed and the program proceeded ahead.

So does that mean, that in the kernel, we cannot attach BPF program to an event that is tied to a specific CPU? Is this a kernel bug? or design necessity?

zephyr0110
  • 223
  • 1
  • 11
  • I'm unsure if it's a bug or by design, but you can work around it by using a per-cpu map orr by checking the current CPU in your BPF program. – pchaigno Sep 27 '22 at 09:16
  • @pchaigno, Yeah I can think of a work around using per cpu map. But, I want to know if it is indeed true or I am making some obvious mistake. – zephyr0110 Sep 27 '22 at 13:08

0 Answers0