2

I want to write a paper with Compiler Optimizations for HyperTreading. First step would be to investigate why a processor with HyperThreading( Simultaneous Multithreading) could lead to poorer performances than a processor without this technology. First step is to find an application that is better without HyperThreading, so i can run some hardware performance counters on it. Any suggest on how or where i could find one?

So, to summarize. I know that HyperThreading benefits are between -10% and +30%. I need a C application that falls in the 10% performance penalty.

Thanks.

Sorin
  • 21
  • 2
  • The modern versions of hyperthreading on i7 are different from the P4 Netburst version. Which are you interested in. The figures you quote sound like for Netburst but who on earth is still interested in that train wreck? – David Heffernan Mar 04 '11 at 15:47
  • Did you manage to find such a program? – netvope Feb 18 '12 at 09:10

2 Answers2

2

Probably the main drawback of hyperthreading is the effective halving of cache sizes. Each thread will be populating the cache, and so each, in effect, has half the cache.

To create a programme which runs worse with hyperthreading than without, create a single threaded programme which performs a task which just fits inside L1 cache. Then add a second thread, which shares the workload, the works from "the other end" of the data, as it were. You will find performance falls through the floor - this is because both threads now must access L2 cache.

Hyperthreading can dramatically improve or worsen performance. It is completely dependent on use. None of this -10%/+30% stuff - that's ridiculous.

0

I'm not familiar with compiler optimizations for HT, nor the different between i7 HT and P4's as David pointed out. However, you can expect some general behaviors.

Context switching is very expensive. So if you have one core and run two threads on it simultaneously, switching back and forth one thread from the other always gives you performance penalty. However, threads do not use the core all the time. For example, if the thread reads or writes memory, it just waits for the memory access to be done, without using the core, usually for more than 100 cycles. There are many other cases that a thread need to stall like this, e.g., I/O operations, data dependencies, etc. Here HT helps, because it can ships out the waiting (or blocked) thread, and executes another thread instead.

Therefore, you can think if all threads are really unlikely to be blocked, then context switching will only cause overhead. Think about very computation-bounded application working on a small set of data.

MHC
  • 6,405
  • 2
  • 25
  • 26
  • @David Heffernan: I didn't find any paper/report that explicitly states the differences of HT in i7 vs Netburst, and sincerely, i don't see why it should be different( except, maybe, for the shared resources). Could you please explain what you are referring to? Thanks – Sorin Mar 07 '11 at 11:00
  • Thank you for your response. As far as i know, the HT is fine-grained, the reason being that the coarse grained would not fit - e.g. making a context switch only on stalls( like a cache miss) will result in emptying the pipeline which is an expensive operation. Therefore the fast context switching was one of the main focuses when the HT was developed. I think i will try to write an application where the high number of threads will lead to resource competition( cache, TLB, etc. ) – Sorin Mar 07 '11 at 11:19