How much speedup does a hyper thread give? (in theory)

  • Mikhail

    I'm wondering what the theoretical speedup is from hyper threaded CPUs. Assuming 100% parallelization, and 0 communication - two CPUs would give a speedup of 2. What about hyper threaded CPU?

  • Answers
  • Konrad Rudolph

    As others have said, this depends entirely on the task.

    To illustrate this, let’s look at an actual benchmark:

    enter image description here

    This was taken from my master thesis (not currently available online).

    This shows the relative speed-up1 of string matching algorithms (every colour is a different algorithm). The algorithms were executed on two Intel Xeon X5550 quad-core processors with hyperthreading. In other words: there were a total of 8 cores, each of which can execute two hardware threads (= “hyperthreads”). Therefore, the benchmark tests the speed-up with up to 16 threads (which is the maximum number of concurrent threads that this configuration can execute).

    Two of the four algorithms (blue and grey) scale more or less linearly over the whole range. That is, it benefits from hyperthreading.

    Two other algorithms (in red and green; unfortunate choice for colour blind people) scale linearly for up to 8 threads. After that, they stagnate. This clearly indicates that these algorithms don’t benefit from hyperthreading.

    The reason? In this particular case it’s memory load; the first two algorithms need more memory for the calculation, and are constrained by the performance of the main memory bus. This means that while one hardware thread is waiting for memory, the other can continue execution; a prime use-case for hardware threads.

    The other algorithms require less memory and don’t need to wait for the bus. They are almost entirely compute bound and use only integer arithmetic (bit operations, in fact). Therefore, there is no potential for parallel execution and no benefit from parallel instruction pipelines.

    1 I.e. a speed-up factor of 4 means that the algorithm runs four times as fast as if it were executed with only one thread. By definition, then, every algorithm executed on one thread has a relative speed-up factor of 1.

  • geoffc

    The problem is, it depends on the task.

    The notion behind hyperthreading is basically that all modern CPU's have more than one execution issue. Usually closer to a dozen or so now. Divided between Integer, floating point, SSE/MMX/Streaming (whatever it is called today).

    Additionally, each unit has different speeds. I.e. It might take an integer math unit 3 cycle to process something, but a 64 bit floating point division might take 7 cycles. (These are mythical numbers not based on anything).

    Out of order execution helps alot in keeping the various units as full as possible.

    However any single task will not use every single execution unit every moment. Not even splitting threads can help entirely.

    Thus the theory becomes by pretending there is a second CPU, another thread could run on it, using the available execution units not in use by say your Audio transcoding, which is 98% SSE/MMX stuff, and the int and float units are totally idle except for some stuff.

    To me, this makes more sense in a single CPU world, there faking out a second CPU allows for threads to more easily cross that threshold with little (if any) extra coding to handle this fake second CPU.

    In the 3/4/6/8 core world, having 6/8/12/16 CPU's, does it help? Dunno. As much? Depends on the tasks at hand.

    So to actually answer your questions, it would depend on the tasks in your process, which execution units it is using, and in your CPU, which execution units are idle/underused and available for that second fake CPU.

    Some 'classes' of computational stuff are said to benefit (vaguely generically). But there is no hard and fast rule, and for some classes, it slows things down.

  • Scott

    I have some anecdotal evidence to add to geoffc’s answer in that I actually have a Core i7 CPU (4-core) with hyperthreading and have played a bit with video transcoding, which is a task that requires an amount of communication and synchronisation but has enough parallelism that you can effectively fully load up a system.

    My experience with playing with how many CPUs are assigned to the task generally using the 4 hyperthreaded "extra" cores equated to an equivalent of approximately 1 extra CPU worth of processing power. The extra 4 "hyperthreaded" cores added about the same amount of usable processing power as going from 3 to 4 "real" cores.

    Granted this is not strictly a fair test as all the encoding threads would likely be competing for the same resources in the CPUs but to me it did show at least a minor boost in overall processing power.

    The only real way to show whether or not it truly helps would be to run a few different Integer/Floating Point/SSE type tests at the same time on a system with hyperthreading enabled and disabled and see how much processing power is available in a controlled environment.

  • Stephen Darlington

    It depends a lot on the CPU and workload as others have said.

    Intel says:

    Measured performance on the Intel® Xeon® processor MP with Hyper-Threading Technology shows performance gains of up to 30% on common server application benchmarks for this technology

    (This seems a bit conservative to me.)

    And there's another longer paper (that I've not read all of yet) with more numbers here. One interesting take-away from that paper is that hyperthreading can make thins slower for some tasks.

    AMD's Bulldozer architecture could be interesting. They describe each core as effectively 1.5 cores. It's kind of extreme hyperthreading or sub-standard multi-core depending on how confident you are of its likely performance. The numbers in that piece suggest a comment speed-up of between 0.5x and 1.5x.

    Finally, performance is also dependent on the operating system. The OS will, hopefully, send processes to real CPUs in preference to the hyperthreads that are merely masquerading as CPUs. Otherwise in a dual-core system, you may have one idle CPU and one very busy core with two threads thrashing. I seem to recall that this happened with Windows 2000 though, of course, all modern OSes are suitably capable.

  • Related Question

    cpu - What is hyper-threading and how does it work?
  • Gordon Gustafson

    I've heard the term hyper-threading thrown around a bit recently, what exactly is hyper-threading and why is it important?

  • Related Answers
  • Josh K

    Hyper-threading is where your processor pretends to have 2 physical processor cores, yet only has 1 and some extra junk.

    The point of hyperthreading is that many times when you are executing code in the processor, there are parts of the processor that is idle. By including an extra set of CPU registers, the processor can act like it has two cores and thus use all parts of the processor in parallel. When the 2 cores both need to use one component of the processor, then one core ends up waiting of course. This is why it can not replace dual-core and such processors.

  • CJM

    Hyper-Threading is where two threads are able to run on one single-threaded core. When a thread on the core in question is stalling or in a halt state, hyper-threading enables the core to work on a second thread instead.

    Hyper-threading makes the OS think that the processor has double the number of cores, and often yields a performance improvement, but only in the region of 15-30% overall - though in some circumstances, there may actually be a performance hit (=<20%).

    Currently, most Atom chips and all i7 (and Xeon-equivalent chips) have hyper-threading, as did some older P4s. In the case of the Atoms, it's a desperate attempt to improve performance without increasing power consumption much; in the case of i7s, it differentiates them from the i5 range of chips.

    Complex processing work won't benefit much from HT, but certain (simple, highly multi-threaded) tasks, such as video encoding, benefit from HT. In reality, there is not a lot in it...