How can I accelerate my CPU-intensive code using a hyper-threading CPU?

Question

I'm toying with Azure VMs equipped with hyper-threading CPUs. F2_v2 size VM is claimed to have a single core with hyper-threading and so two "virtual" CPUs (2.7 GHz Intel Xeon® Platinum 8168 (SkyLake) is claimed to be used there). I use code from this answer to craft a CPU-intensive load.

static void doWork()
{
    var start = DateTime.UtcNow;
    var value = FindPrimeNumber(1024 * 1024);
    var end = DateTime.UtcNow;
    Console.WriteLine(String.Format("Value {0} found in {1} seconds",
        value, (end-start).TotalSeconds));
}

First I just call that function from inside Main():

static void Main(string[] args)
{
    doWork();
}

and the time reported is around 32 seconds. I then run two processes from that executable (using two consoles and being fast) - each then slows down and time reported gets to around 64 seconds - exactly as expected because well, there's a single real CPU and it cannot run two processes in parallel.

Then I try to run two threads in parallel using Parallel.Invoke():

static void Main(string[] args)
{
    Parallel.Invoke(() => doWork(), () => doWork());
}

(yes, two threads concurrently obtain time and concurrently write to the console afterwards but I expect that this shouldn't affects the results much) and the time reported becomes about 60 seconds - slightly faster than with two processes. This test is done using Windows command prompt, not by invoking under debugger.

I expected that with hyper-threading I could run two threads concurrently and have twice as much computational work completed in the same time - something close to 95% speedup. Looks like it isn't the case.

What am I doing wrong? How can I do it right?

When you are just doing ` doWork()` once - is 50% of CPU utilized (as reported by task manager for example), or close to 100%? — Evk, Nov 16 '17 at 10:01
Benchmark fail, surely. Hyperthreading ekes more compute cycles out of single core by taking advantage of a program not being able to keep all sub-units of the processor busy. How much you get greatly depends on the instruction mix of the "main" thread, 30% is the usual back-of-the-envelope number. That benchmark code just isn't representative of the typical mix and both threads compete for the exact same sub-units. Running something else as the 2nd thread that is branchy and uses floating point math or hits the memory controller heavy ought to give a different result. — Hans Passant, Nov 16 '17 at 10:22
@Evk It's 49 percent load when one thread runs and 100 percent load when two threads run reported by Task Manager. — sharptooth, Nov 16 '17 at 10:36
Including writing to the console in a parallel benchmark is just bonkers. On all but the most specialised hardware writing to the console is essentially and unavoidably a serial operation. Generally people with access to such hardware know how to use it, so I assume that OP doesn't have such access. — High Performance Mark, Nov 16 '17 at 12:07
Writing to the console is done after the stopwatch is stopped so it's not included into the result. — sharptooth, Nov 16 '17 at 12:44
@sharptooth With all respect, if your motivation for true-[PARALLEL] tools is the processing { throughput | performance }, any CPU-intensive code does not get better throughput from latency-masking by enabling a CPUcore to switch two HW-supported threads by the HyperThreading. Next, **the above presented assumption to expect ~ 95% speedup just from two HT HW-threads on the same CPUcore** could not be more unreal. Kindly check the Amdahl's Law, both the original formulation and its criticism in >>> https://stackoverflow.com/tags/parallelism-amdahl/info to get back to reality why this cannot fly — user3666197, Nov 16 '17 at 15:06

How can I accelerate my CPU-intensive code using a hyper-threading CPU?

0 Answers0