2

Assume I have a process that consists of two ideally independent tasks (ideally, to remove the communication overhead). Would it be faster to do it on a single-core processor of 3GHz speed or two-core processor of 1.5GHz speed?

Of course, in case of two-core processor, the job is ideal for parallization. And for the single core, the two tasks will be time shared.

Update: questition in other words

A single-core processor of double the speed is always a better option than a two-core processor?

KhaledWas
  • 73
  • 5

2 Answers2

3

Ideally independent 2 tasks running on unideal OS like Windows 2012 will run faster on 2 cores at 1.5GHz, than on 1 core at 3GHz due to elimination of thread context switching overhead.

Unfortunately, there are very very few ideally independent tasks.

Riad Baghbanli
  • 3,105
  • 1
  • 12
  • 20
  • Thanks for the answer. So can I say that one single core processor of double the speed is always a better option than a two-core processor? – KhaledWas Mar 07 '16 at 16:34
  • 1
    No, Khaled. It really depends on tasks and OS, OS tasks. Generally, it is more or less ok to say that 4 core 3GHz will be faster than 8 core 1.5GHz for most of the tasks. But with Windows, single core CPU is always at disadvantage due to numerous OS processes running. – Riad Baghbanli Mar 07 '16 at 16:37
  • you mean that these numerous OS processes running will take time-sharing overhead more than this of communication between parallel tasks? – KhaledWas Mar 07 '16 at 16:39
  • 1
    Yes Khaled. Essentially it will come down to where there are more wasted computing cycles, context switching between numerous OS services threads and tasks threads versus overhead of thread synchronization if any. Generalized statements are rarely work in these cases. – Riad Baghbanli Mar 07 '16 at 16:46
  • Disagree. There will be the same number of context switches, equal to the total time of execution divided by the size of the time slice. – SergeyA Mar 07 '16 at 16:59
  • @SergeyA but in two-core context switches **are done in parallel**, hence less time is wasted for 2 context switches compare to a single core. – shay__ Mar 07 '16 at 21:24
  • @shay__, doesn't matter. There will be a context switch every time slice on every core. Let's assume, time slice is 100 ms, context switch takes 2 ms and a process takes 10 seconds to complete on slow core (5 on fast than). On one fast core, you will spend 5 * 2 + 5 * 2 * 10 * 0.002 = 10.2 seconds in total. On two slow cores you will spend exactly the same amount of seconds on each core, for the same clock time of 10.2 seconds. – SergeyA Mar 07 '16 at 21:31
  • @SergeyA - I (and I believe rbaghbanli as well) was talking about context switches between OS tasks, not the process tasks. – shay__ Mar 07 '16 at 21:41
2

The question as posted is severely underspecified. First, it appears to confuse performance with processor frequency. Even with identical core microarchitectures, memory latencies are not fixed in cycle counts. Traversing a billion item linked list is a (contrived) workload that is dependent on memory latency, where two parallel "half-speed" threads would be faster than time-slicing.

If the lower frequencies are not the result of product binning, power-saving configuration, or the like but from a shallower pipeline (at the same width), then the "slower" processor would have a lower branch misprediction penalty and a lower latency in cycles to the same cache capacity, leading to higher instructions per cycle on most workloads.

Even with identical microarchitectures, two cores will also avoid cache warm-up context switch overhead. The cost of a context switch is not just the time taken to invoke the OS, run the OS scheduler (with only two active threads on two cores, the OS scheduler overhead would be slightly lower because there are no other ready threads but there would be twice as many timer interrupts), and swap register contents. (If run in batch mode, such context switch overhead would be avoided.)

Another factor to consider is whether the two tasks encounter independent bottlenecks. For example, if one task is extremely compute intensive but the other is bound by main memory bandwidth, then running them in parallel can provide better performance than time slicing; with time slicing the memory bandwidth potential is unused during the compute-intensive time slices.

Yet another factor is interference with constrained resources. For example, DRAM can suffer from bank conflicts which can substantially reduce effective bandwidth. If memory addressing and timing happen to cause maximum conflicts during parallel operation, then effective bandwidth would be reduced. A similar effect can be generated from limited associativity in a shared last level cache.

More recent processors are also tend to be thermally limited, so the double-frequency processor might not be able to sustain that frequency under maximum utilization if that frequency is not guaranteed under power virus conditions, whereas the alternative two-core system is likely not to encounter that power density constraint.