imagine you create a cgroup that isolates n logical cores from the general Linux scheduler. then one at a time, you create and run m processes that together comprise n threads. so # of process threads == # of logical cores.
i'm trying to decide whether it's worth writing my own process/thread -> logical core scheduler in my container runtime which pins each thread to a specific logical core, ensuring that each thread of a process resides over as few physical cores as possible to maximize cache coherency... or just allow Linux to schedule threads as it sees fit over the range of logical cores owned by the cgroup.
my intuition tells me if # of process threads == # of logical cores, then Linux must already implicitly choose this optimally distributed configuration-- with the added advantage of the flexibility to trade off cache coherency for less contended computation resources, by moving one of two threads on the same physical core to a totally idle physical core whose threads are currently sleeping.
wondering if anyone has insight to confirm these assumptions?