Why Linux kernel never implemented a per data object RCU mechanism?

Question

The core RCU APIs in the Linux kernel applies to all clients in the kernel, which means any reader (even if they are accessing totally unrelated data structures) accessing rcu-backed data will be treated equally. And calls like synchronize_rcu() needs to wait for all readers, even if they are accessing entirely unrelated data structures under the hood.

Why is it that the Linux kernel never added the support for a per data object RCU? Am I missing anything here? I think the implication of the current RCU APIs is that if there are a lot of clients in the kernel, the overall performance of RCU may suffer since they share a global view.

score 0 · Answer 1 · answered Dec 16 '20 at 16:15

0

I think the implication of the current RCU APIs is that if there are a lot of clients in the kernel, the overall performance of RCU may suffer since they share a global view.

No, this is wrong implication. RCU implementation in the Linux kernel is perfectly scalable for the number of "clients".

You want to "replace" the single "lock object", used for RCU, with multiple lock objects, so protection of different data could use different lock object. But the RCU implementation does NOT use any lock object at all!

Because of that, RCU implementation is quite complex and uses inner details of the Linux kernel (e.g. scheduler), but this is worth of it. E.g. rcu_read_lock and rcu_read_unlock work much faster than any sort of spin_lock, because of absence of memory contention with other cores.

Actually, "lock object" are used for sleepable version of RCU (sRCU). See e.g that article.

answered Dec 16 '20 at 16:15

Tsyvarev

60,011
17
110
153

I understand that `rcu_read_lock()` works much faster then any lock, but my specific question is why there's no alternative version of RCU API like `rcu_read_lock(struct rcu_meta *rcu)` and `synchronize_rcu(struct rcu_meta *rcu)` ? Why all readers of different code are treated the same? – Vitt Volt Dec 17 '20 at 06:32
With additional synchronization object (like `struct rcu_meta *rcu`) the functions like `rcu_read_lock` would need to check this object. And this would imply **memory contention** with other threads who also check that object. That contention would make `rcu_read_lock` **slower**. Also, in the current implementation RCU-protected sections could be nested but deadlock is impossible. With additional parameter to `rcu_read_lock` that "deadlock-free" property will be lost. So, why they need new API which makes their code slower and more difficult in use? – Tsyvarev Dec 17 '20 at 09:16

score 0 · Answer 2 · answered Oct 14 '21 at 13:55

I've been asking myself the same question recently, and my reasoning is as follows:

rcu_synchronize() only waits on readers that were already engaged in a RCU critical section at the time it is called, so "new" clients are not an issue
it is not allowed to block while inside a RCU critical section, so readers will not linger for long blocking rcu_synchronize()
there is a limited number of other "contenders": at most 1 per logical CPU

If my understanding/reasoning is correct, I suspect the overhead of possible contention would remain low enough not to justify using client-specific RCU locks. I don't know how that would scale with a large number of CPUs.

This is based on my understanding and reading of kernel code, assuming CONFIG_TREE_RCU. Things may change with other implementations (e.g. with CONFIG_TINY_RCU, synchronize_rcu() is… empty???). Also, AFAIU in some specific cases preemption may occur within the critsec (e.g. with CONFIG_PREEMPT_RCU?). I don't know whether this changes things much but don't expect it to (as probably only a higher-priority task could preempt a RCU critsec?).

Why Linux kernel never implemented a per data object RCU mechanism?

2 Answers2