0

The idea of my eBPF program is to trace datas on some schedule related tracpoints such as shced_wakeup.

For some reasons I need to know when these tracepoints are triggered, in which cgroup it happens.

To achieve that, I've found a way to get the cgroup name through bpf_get_current_task() -> cgroups -> subsys -> cgroup -> kn -> name. And the name is a variable of type char *.

So I want to create a output BPF map to my main golang program. The key of this map is type char * to store one cgroup's name(basically its file system path), and the value of this map is type u64, for example.

It looks like that the map does return some address value such as 0x00000000 (just a random address). So in golang I use cilium/ebpf to take it with a var cgroupName unsafe.Pointer. When I want to print it out, I use *(*string)(cgroupName), but it only print out a ''(nil value).

Is this because of the address is in kernel space, or BPF stack, or any address that my golang program(apparently in user space) can not access? Or is there anything wrong with my whole idea?

To make it more clear, you can refer to belowing code:

bpf.c

    #include "vmlinux.h"
    #include "bpf_helpers.h"
    #include "bpf_core_read.h"

    struct {
        __uint(type, BPF_MAP_TYPE_HASH);
        __uint(max_entries, 10240);
        __type(key, u32);
        __type(value, char *);
    } pid_cgroup_name SEC(".maps");

    SEC("tp/sched/sched_wakeup")
    int handle__sched_wakeup(struct sched_wakeup_tp_args *ctx)
    {
        struct task_struct *task = (void *)bpf_get_current_task();

        return trace_enqueue(task);
    }

    static __always_inline
    int trace_enqueue(struct task_struct *task)
    {
        u32 pid;
    
        struct css_set *cgroups;
        struct cgroup_subsys_state *subsys[14];
        struct cgroup *cg;
        struct kernfs_node *kn;
        
        char *cgroup_name;
        
        bpf_core_read(&cgroups, sizeof(cgroups), &task->cgroups);
        bpf_core_read(&subsys, sizeof(subsys), &cgroups->subsys);
        bpf_core_read(&cg, sizeof(cg), &subsys[1]->cgroup);
        bpf_core_read(&kn, sizeof(kn), &cg->kn);
        bpf_core_read(&cgroup_name,sizeof(cgroup_name),&kn->name);
        
        if (!cgroup_name)
            return 0;

        bpf_core_read(&pid, sizeof(pid), &task->tgid);

        bpf_map_update_elem(&pid_cgroup_name, &pid, &cgroup_name, 0);
        return 0;
    }

main.go

    package main

    import (
        "C"
        "github.com/cilium/ebpf/link"
        "github.com/cilium/ebpf/rlimit"
        "log"
        "time"
    )

    // $BPF_CLANG and $BPF_CFLAGS are set by the Makefile.
    //go:generate go run github.com/cilium/ebpf/cmd/bpf2go -cc 
    $BPF_CLANG -cflags $BPF_CFLAGS bpf bpf.c -- -I../headers - 
       I../csl-headers

    func main() {
        // Allow the current process to lock memory for eBPF resources.
        if err := rlimit.RemoveMemlock(); err != nil {
            log.Fatal(err)
        }

        // Load pre-compiled programs and maps into the kernel.
        objs := bpfObjects{}
        if err := loadBpfObjects(&objs, nil); err != nil {
            log.Fatalf("loading objects: %v", err)
        }
        defer objs.Close()

        
        tpWakeup, err := link.Tracepoint("sched", "sched_wakeup", objs.HandleSchedWakeup, nil)
        if err != nil {
            log.Fatalf("opening tracepoint: %s", err)
        }
        defer tpWakeup.Close()

        ticker := time.NewTicker(2 * time.Second)
        defer ticker.Stop()

        log.Println("Waiting for events..")

        for range ticker.C {
            mapIterator := objs.PidCgroupName.Iterate()
            var pid, uint32
            for mapIterator.Next(&pid, cgroupName) {
                log.Printf("get pid %v for cgroup name: %s", pid, cgroupName) 
            }
        }
    }

Because I use cilium/ebpf to write main.go, to run main.go successfully, a go generate command will produce bpf_bpfel.go code.

Then you can use command go run main.go bpf_bpfel.go to see some results.

It looks like this:

2023/03/08 16:20:11 get pid 12345 for cgroup name:

You can see that cgroup name prints out nothing.

Dylan Reimerink
  • 5,874
  • 2
  • 15
  • 21
54vault
  • 39
  • 4

1 Answers1

3

One issue here is that you are trying to pass a kernel pointer to userspace and are expecting that to work. I can't tell from the code you submitted what type cgroupName is, but in any case it seems like you are not dereferencing the pointer to the C string since that would almost certainly cause a SEGFAULT.

Instead, you should copy the string. Start by changing your map type over to an array with some max capacity

#define MAX_SIZE 128
struct {
    __uint(type, BPF_MAP_TYPE_HASH);
    __uint(max_entries, 10240);
    __type(key, u32);
    __type(value, char[MAX_SIZE]);
} pid_cgroup_name SEC(".maps");

Then in trace_enqueue we also change cgroup_name to be an array of the same size. We can use the bpf_core_read_str function to do a CO:RE read for the string, giving it the max size of out array. And we can then write the array into the map.

static __always_inline
int trace_enqueue(struct task_struct *task)
{
    u32 pid;

    struct css_set *cgroups;
    struct cgroup_subsys_state *subsys[14];
    struct cgroup *cg;
    struct kernfs_node *kn;
    
    char cgroup_name[MAX_SIZE];
    long name_len;
    
    bpf_core_read(&cgroups, sizeof(cgroups), &task->cgroups);
    bpf_core_read(&subsys, sizeof(subsys), &cgroups->subsys);
    bpf_core_read(&cg, sizeof(cg), &subsys[1]->cgroup);
    bpf_core_read(&kn, sizeof(kn), &cg->kn);
    name_len = bpf_core_read_str(&cgroup_name, MAX_SIZE, &kn->name);

    if (name_len < 0)
        return 0;

    bpf_core_read(&pid, sizeof(pid), &task->tgid);

    bpf_map_update_elem(&pid_cgroup_name, &pid, &cgroup_name, 0);
    return 0;
}

On the Go side the map value can be interpreted as [128]byte. You can cast it to a slice, then use ByteSliceToString to strip the null bytes and convert it to a string.

Dylan Reimerink
  • 5,874
  • 2
  • 15
  • 21
  • You are right about `bpf_core_read_str`. One more question, I was once trying to use `memcpy` to copy `&kn->name` into a char array give the array to my go program. And same as dereferencing the kernel pointer directly in go, it cannot read the content in the char array, too. Do you know why this happens? Anyway, `bpf_core_read_str ` solves everything! – 54vault Mar 13 '23 at 06:11
  • The issue with using memcpy on a string is that strings in C and thus the kernel are nil terminated. That is also the key part of bpf_core_read_str, it will read until the destination size is reached or \0 in the source. I can't explain the exact scenario you are describing without an example, as in not seeing anything. I would expect that after the \0 of the string you might see "garbage" that being memory after the string which is part of a different variable you were intending to read. – Dylan Reimerink Mar 13 '23 at 14:30
  • - "The issue with using memcpy on a string is that strings in C and thus the kernel are nil terminated." sorry but I don't understand this entirely, my example is just like my original question. If I replace `char * cgroupname` with `char cgroupname[MAX_SIZE]` and use `memcpy` to copy `&kn->name` into `char cgroupname[MAX_SIZE]`, the golang program still print nothing. So I don't know why bpf_core_read_str can copy char * into a char array and let golang to print it out while memcpy cannot do this work. @Dylan Reimerink – 54vault Mar 14 '23 at 06:53