2

I was looking at the different types of BPF program, and noticed that for different program types the context is being passed differently.

Example:

  1. For program type BPF_PROG_TYPE_SOCK_OPS, an object of type struct bpf_sock_ops_kern is passed. However, the BPF program of this type takes a reference to struct bpf_sock_ops. Why is it done this way and where is the "translation" from bpf_sock_ops_kern to bpf_sock_ops?

  2. For program type BPF_PROG_TYPE_CGROUP_SKB, an object of type struct sk_buff is passed (e.g., in __cgroup_bpf_run_filter_skb), but the BPF program expects a minimized version, struct __sk_buff.

So I looked at the struct bpf_verifier_ops function callbacks, but they seem to only adjust the offsets in BPF instructions, as they are called by the BPF verifier.

I'd be glad if someone could shed light on how the BPF context is defined. Thanks.

pchaigno
  • 11,313
  • 2
  • 29
  • 54
Mark
  • 6,052
  • 8
  • 61
  • 129
  • 2
    For the first one, [`bpf_sock_ops_kern`](https://elixir.bootlin.com/linux/latest/source/include/linux/filter.h#L981) is just a subset of [`bpf_sock_ops`](https://elixir.bootlin.com/linux/latest/source/include/uapi/linux/bpf.h#L931). To convert, `sock_filter_convert_ctx_access` only need to advance the pointer after the first `sk` field. The verifier will then ensure that fields after the union are not accessed. I've looked into the second case yet. – pchaigno Mar 01 '18 at 14:38
  • 2
    Okay, so for the second one: [`bpf_convert_ctx_access`](https://elixir.bootlin.com/linux/v4.15.7/source/net/core/filter.c#L3896) matches on each possible required offset on `__sk_buff`, one by one, and converts them to the equivalent offset in the `sk_buff` object. Does that answer your question? I'll make a proper answer if that's the case. – pchaigno Mar 01 '18 at 14:43
  • 1
    I don't think it has to do with performance. These data structure are used to limit access of BPF programs to a handful of fields; only fields that programs should be able to access are in mirror structures (e.g., `bpf_sock_ops` and `__sk_buff`). For example, you can see the process for `__sk_buff` described by Alexei [here](https://github.com/torvalds/linux/commit/9bac3d6d548e5cc925570b263f35b70a00a00ffd), with more details in [the PATCH description](https://lwn.net/Articles/636647/). – pchaigno Mar 01 '18 at 21:08
  • 1
    As far as I can tell pchaigno is right, `struct __sk_buff` has little to do with performance but is used mostly for simplicity, to offer a cleaner interface to BPF users (only offer the fields that can be accessed from BPF). It's converted in the verifier with `bpf_convert_ctx_access`, as mentioned already. Then you have additional checks in `net/core/filter.c` (for networking), to make sure the user can read from, possibly write to, each of the fields of the struct. See `tc_cls_act_is_valid_access()` function for example. (I'm less familiar with tracing bits.) – Qeole Mar 02 '18 at 16:29
  • @pchaigno, thanks for responses! You can make an official answer, which I can accept, so that others can benefit :-) You probably can incorporate Qeole's comment, as it is also useful. – Mark Mar 02 '18 at 21:50
  • I've posted a full answer. I took the `struct bpf_sock_ops` mirror structure as an example, but the process is the same for other mirror structures. – pchaigno Mar 06 '18 at 10:27

1 Answers1

3

The mirror objects (e.g., struct bpf_sock_ops) passed as argument expose a subset of the original object(s)'s fields to the BPF program. The mirror structure can also have fields from several different original structures; in that case, the mirror object serves as aggregate. Passing the original object(s) to the BPF program would also be misleading as the user could think they have access to all fields. For example, they could think they have access to bpf_sock_ops_kern.sk when that's actually not the case.

The verifier then converts accesses to the mirror object into accesses to the original object(s), before the program is executed for the first time. There's a conversion function for each type of mirror object (e.g., sock_ops_convert_ctx_access for the conversion of accesses to struct bpf_sock_ops). Then, for each field of the mirror object (i.e., for each offset), the conversion function rewrites the load or store instruction with the offset to the original field.

Note that all original fields might not be in the same object. For example, in the mirror object struct bpf_sock_ops, the fields op and family are retrieved in bpf_sock_ops_kern.op and bpf_sock_ops_kern.sk->skc_family respectively.

pchaigno
  • 11,313
  • 2
  • 29
  • 54