I believe a lot of your confusion has to do with the fuzzy mapping of the capabilities of cpufreq to the actual capabilities of the hardware.
Here’s a description of what is taking place on the HW and in the OS.
A processor is a collection of cores on the same silicon substrate. The cores are what we used to call CPUs with some enhancements. CPUs now have the capability of running multiple HW threads (hyperthreading), each hardware thread being equivalent to one of the old type CPUs. Putting this all together, the 1270v3 is a quad core (if I recall correctly), meaning it has 4 cores on the same silicon substrate. Each core can support two HW threads, each HW thread being equivalent to what the OS calls a CPU (and I’ll call a virtual CPU). So from the OS perspective, the 1270v3 has 8 (virtual) CPUs.
The OS doesn’t see packages, cores or HW threads. It sees CPUs, and to it there appear to be 8 of them.
To further complicate the issue, modern processors have various HW supporting power saving states called P-states, C-states and package C-states. Why do I mention these? It’s because they are intimately associated with the frequency of the processor. And cpufreq professes to provide the user with control over the processor’s frequency.
Now, I’m not familiar with cpufreq outside of reading the manpage and other material on the web. From my reading, it has a lot of idiosyncrasies, so I’ll talk about what it is doing from a broad perspective.
In a general sense, cpufreq has a lot of generic capability that may or may not be supported by the HW or the kernel. This is confusing because it looks like the functionality is there but then things don’t happen as you would expect. For example, cpufreq gives the impression that you can set each CPU’s frequency independently. In reality, on a hyperthreaded system, two “CPUs” are associated with each core and must have the same frequency.
A lot of the functionality that cpufreq is supposed to control is associated with the power efficiency characteristics of the processor, but again, its mapping to the processor’s actual hardware capabilities is incomplete and misleading. Though cpufreq seems to allow you to set max and min frequencies, the processor hardware doesn’t work this way. In modern Intel processors, such as the 1270v3, there are something called P-states. These P-states are frequency-voltage pairs that slow down a processor’s frequency (and drop its voltage) to reduce the processor’s power consumption at the cost of the application taking longer to run. These frequency-voltage pairings aren’t arbitrary though cpufreq gives the impression that they are.
What does this all mean? In addition to the thread migration issues that the commenter mentioned, cpufreq isn’t going to behave the way you expect because it appears to have capability that it actually doesn’t, and the capability that it does actually have maps only roughly to the actual capabilities of the HW and OS.
I embedded some further comments in your text.
With a 1270v3 and a single thread app I'm at the end of performance but when I watch monitoring tools like atop I don't understand how this whole stuff works. I tried to find a nice article about this sort of topic but they either have been explained in a language I don't understand or are not about the stuff I would like to know. I hope it is alright to ask this kind of stuff here.
From my understanding a single-thread app does only use one thread for all/most of the work. [Yes, but this doesn’t mean that the thread is locked to a specific virtual CPU or core.] So the performance is defined by the single-thread power of the CPU. [It’s not that simple. The OS migrates threads around, it has its own maintenance processes, etc] A moment before I wrote this question I played around with CPU-frequency and noticed that although there are only two instances of the app running the usage is shared across all cores. So I assume that the thread jumps around between these cores. So I set the CPU scaling to performance with cpufreq-set -g performance. The result was that all CPU cores/threads stayed at about 2GHz like it was before besides one that is permanently on 3.5GHz (100%). As I only changed the scaling for one core, why is the usage still shared across all cores? I mean the app is running at about 300%, why doesn't it stick to the CPU core with the 100%? [Since I can’t see what you are observing, I don’t really understand what you are asking.]
Furthermore as I noticed that only one of the CPU's got scaled up I looked into the help page and found -r which should scale all cores with the performance settings. Unfortunately nothing does change. (Is this a bug in Ubuntu 1404?) So I used -c with the number 8 (8 threads) -> didn't work. 4 -> works but only scales 2 cores out of 8. 7 -> scaled 4 cores. [I haven’t used cpufreq so can’t directly speak to its behavior, but the manpage implies that “-c ” refers to a specific virtual CPU and not the number of virtual CPUs.] So I'm wondering, does this not support hyper-threading or is the whole program that buggy? [Again, I’m not sure from your explanation what you are doing, but the n->n/2 behavior makes sense to me. You can change the frequency of a core but since each core has two hyperthreads/virtual CPUs, two of those virtual CPUs must scale together.]
However as I understand it, the CPU's with the max frequency together with the thread jump around in the monitoring tools as they display the average the usage, which than looks like shared. Did I figure this right? [Again, I’m not sure what you are observing. Both physically and in atop, the CPU designation should not change, meaning CPU001 will always refer to the same virtual CPU. The core with the max frequency shouldn’t physically jump around, though the user thread may. Something to note is that monitoring tools can be pretty heavy users of the CPU. This heavy usage can make figuring out your processor usage difficult if it causes threads to jump around to different virtual CPUs.]
Would forcing one cpu to 3.5GHz and forcing the app to this core improve performance or is all the stuff I'm wondering about only about avg calculation between the data they show each second. [I found a pretty good explanation of atop with a lot of helpful screen shots: http://www.unixmen.com/linux-basics-monitor-system-resources-processes-using-atop/] If so am I right that I should run best with cpufreq-set -c 7 -g performance if power consumption doesn't matter? [It all depends upon what other processes are running on your system. If your system is mostly idle except for your processes, then forcing a core to a certain frequency won’t make a difference. [I’m suspicious of what a “governor” does. The language appears to refer to power-efficiency/performance (“balanced”, “powersave”, “performance”, etc) but the details don’t match the capability of today’s hardware.]
Thanks for reading so far, I hope you have a moment to help me