7

I just had a (to me) very odd observation and want to know how this can be. I tested the following two versions of code:

chrono::steady_clock::time_point t1 = chrono::steady_clock::now();
process_data(l, 8);
chrono::steady_clock::time_point t2 = chrono::steady_clock::now();
chrono::duration<double> time_span = chrono::duration_cast<chrono::duration<double>>(t2 - t1);
cout << "time used: " << time_span.count() << endl;

vs

chrono::steady_clock::time_point t1 = chrono::steady_clock::now();
thread t1 = thread(process_data, l, 8);
t1.join();
chrono::steady_clock::time_point t2 = chrono::steady_clock::now();
chrono::duration<double> time_span = chrono::duration_cast<chrono::duration<double>>(t2 - t1);
cout << "time used: " << time_span.count() << endl;

For reasons I don't understand, the second version is 20% faster...

How can this be? The chrono::steady_clock should measure the time correctly, I think... But then I fail to see how creating another thread, and waiting for it can actually be faster then doing it with the initial thread. What am I missing?

Some details: there is no code besides the definition of l before above posted snippets, and no other calculations comes after it (it is the main function) and process_data() is just a massive number-cruncher, including some file-reading operations (no threads used there).

jww
  • 97,681
  • 90
  • 411
  • 885
Mahrgell
  • 280
  • 1
  • 11
  • what compiler are you using? for reference? – Theolodis May 06 '14 at 11:51
  • are you measuring both versions in a single programme or in different ones? And what time are we talking about? seconds, milliseconds, microsections? – MatthiasB May 06 '14 at 11:51
  • 2
    i'm using visual studio pro 2013, and i'm measuring it in 2 programs – Mahrgell May 06 '14 at 11:52
  • How many samples did you take of each one? – Eric Finn May 06 '14 at 12:02
  • 2
    what you could try is to average it over multiple executions, to get a more acurate result. I had the experience that execution time can vary from time to time, especially for short running programs. Also, I had the problem once that I was using a notebook for testing, where battery saving functions made measurements unreliable, even in performance mode. – MatthiasB May 06 '14 at 12:02
  • Is anything else running on the machine? Also, I reiterate MatthiasB's question: are we talking seconds, milliseconds, what? Lastly, did you re-run the first version after you ran the second version to confirm the timings? – kec May 06 '14 at 12:06
  • 1
    I tried it over 100 runs each, the first code takes 7,7 secs on average, the second one 6,5 secs. – Mahrgell May 06 '14 at 12:07
  • and no, nothing else is running on the machine, i did quite a lot of tests again and again with both versions, as it surprised me a lot – Mahrgell May 06 '14 at 12:08
  • 1
    Is there any possibility that every time you run the second test the file data happens to be in memory, and every time you run the first test the file data is not in memory? Is there any possibility of NUMA effects on the machine you are using? – kec May 06 '14 at 12:28
  • Nope, i ran the tests in any order... It didn't matter – Mahrgell May 06 '14 at 12:31
  • 1
    Do both threads have the same affinity? Priority? Allowing a 6-second computation to proceed on a CPU for which the OS is not competing can reduce time-to-completion. Wrap the whole thing in an additional thread and the 'benefit' to multithreading might go away. – AndrewS May 14 '14 at 19:54
  • Are you running it in `release` configuration? `debug` disables many optimizations. – ButterDog Jun 05 '14 at 07:28
  • What CPU architecture are you using? What Visual Studio compile flags? That could help to narrow down the problem. After seeing this recent talk by the Visual Studio compiler developer Eric Brumer (http://channel9.msdn.com/Events/Build/2014/4-587), I realized that modern CPUs are crazy. ;-) Starting at 28:45, he showed a performance bug on Haswell where replacing two 128-bit assignments by a single 256-bit assignment reduced the overall performance by 60%. (Since then, they have fixed that specific problem but not in VS 2013.) – Philipp Claßen Jun 10 '14 at 22:32
  • This might have to do with cpu frequency scaling and intel cpu turbo feature too. Just so many variables. Unless you provide a single piece of code that people can take and run for themselves to see, its hard to guess whats going on – PlasmaHH Jun 25 '14 at 09:51
  • 1
    I do seem to be able to reproduce this. This is the code I tried: http://pastebin.com/DKk04Q5R – TripShock Jun 28 '14 at 19:47
  • Is it possible the compiler is reordering things? What does the disassembly look like? – Ben Jul 02 '14 at 13:03
  • I cannot reproduce this, using @TripShock 's code paste. Using VS2013 on Win 8.1 Pro w/ an i7-4920K. Compiled in x86 w/ no threads takes roughly 5.05s; w/ threads it takes roughly 5.76s. Compiled in x64 w/ no threads is roughly 5.923s; w/ threads it takes roughly 5.924s. – leetNightshade Jul 14 '14 at 22:52
  • reading the comments. wow. I've got a situation where I added an instruction that _is never executed_, which resulted in ~10% slower performance. very consistent and repeatable. I should look into that again and ask a question about it. – Michael Gazonda Aug 01 '14 at 05:17

1 Answers1

3

The only overhead you got is the thread creation , since your main thread will just sleep until the join.

The thread creation overhead is meaningless compared to your process_data , considering that your program takes 7,7 or 6,5 seconds to run.

So your question can now become : How come a worker thread is faster then the main thread ?

There are many reasons why this could happen , couple that go through my mind :

  • When you are creating the new thread he gets lucky and ends up on a core all by itself
  • There are watchers added by the OS/other programs on your main thread -> which result in the comp running slower as a whole when your main thread isn't idle

The OS / other programs usually go after the main thread of a process for communication, watching etc. so it's not unusual for a main thread to be slower then a worker thread for big data processing.

Even if the thread is a higher priority thread, it doesn't guarantee that things on that thread will move faster.

Here's another similar question: Why main thread is slower than worker thread in pthread-win32?

Community
  • 1
  • 1
MichaelCMS
  • 4,703
  • 2
  • 23
  • 29