If you have more then one socket ("stone") that means you have NUMA system.
Here is a link to get more info https://en.wikipedia.org/wiki/Non-uniform_memory_access
Try to use CPUs on the same socket. Below I will explain why and how to do that
Determine what exactly CPU ids located on each socket.
% numactl --hardware
available: 2 nodes (0-1)
node 0 cpus: 0 2 4 6 8 10 12 14 16 18 20 22
node 0 size: 24565 MB
node 0 free: 2069 MB
node 1 cpus: 1 3 5 7 9 11 13 15 17 19 21 23
node 1 size: 24575 MB
node 1 free: 1806 MB
node distances:
node 0 1
0: 10 20
1: 20 10
Here "node" means "socket" (stone). So 0,2,4,6 CPUs are located on the same node.
And it makes sense to move all IRQs into one node to use L3 cache for set of CPUs.
- Isolate all CPUs except 0,2,4,6.
Need to add argument to start Linux kernel
isolcpus= cpu_number [, cpu_number ,...]
for example
isolcpus=1,3,5,7-31
- Control what IRQs are running on what CPUs
cat /proc/interrupts
- Start your application with numactl command to aligne to CPUs and Memory.
(Here need to understand what NUMA and aligned is. Please follow the link at the beginning of the article)
numactl [--membind=nodes] [--cpunodebind=nodes]
- Your question is much bigger than I mentioned here.
If you see the system is slow need to understand bottleneck.
Try to gather raw info with top, vmstat, iostat to find out the point of weakness.
Provide some stat of your system and I will help you to turn it up right way.