HyperThreading is an extension of superthreading, a concept that's been around in the microprocessor world for quite a while and implemented in various guises over the years. But before we cover what HyperThreading (and therefore superthreading) is, lets have a look at what happens on a modern x86 processor (single core, single processor) when running a modern x86 compatible operating system such as (but not limited to) Windows 2000, Windows XP or Linux.
Programs written to support a modern operating system execute in threads. Threads of code containing the intructions and data needed to complete that part of the program are dispatched to the processor via the operating system scheduler. The scheduler decides what thread should be executed next by the processor and then dispatches it to the processor at the next OS time slice (or quanta) interval. Now as far as the thread is concerned it has full access to the processor for as long as it needs it and will be able to have the processor until its done. But in reality something different happens. The operating system and processor work together to rapidly switch between many different executing threads. Imagine the case if this weren't to happen.
The operating system has it's own management thread that processes keyboard and mouse input so you can point and click and type and manipulate the OS and programs running on it. There's also a thread that executes the display driver so that you can see what's going on, there's a thread for audio processing so you can hear audio and so on. Now if each thread were to be able to monopolise the processor at the expense of all others, you'd hypothetically see a few things happening. First the screen would redraw but your mouse and keyboard inputs wont be showing and any audio wouldn't be playing. Then all of a sudden a piece of audio plays and stops, then the screen redraws again this time with your new mouse position and new keyboard inputs displayed. Things would appear very erratic and it would be impossible to use your computer. Throw in something like SETI to the mix that may take 3 hours or more to complete and you can't see or hear or use anything on your computer for 3 hours until SETI is done!
So what happens is that, transparently to any executing threads, the operating system and processor swap the threads onto and off of the processor, saving their state and restoring it as needed. When a thread has had a certain amount of time executing on the CPU, maybe a few hundred microseconds, the operating system tells the processor to save the state of that thread and save it to memory and then another thread is swapped in and its executing state set up ready to go. Executing state can consist of everything from data in memory to the contents of internal CPU registers. These executing states that contain CPU resources like registers and the stack are called CPU contexts, the context in which the CPU is executing. So the CPU rapidly saves and restores contexts every defined interval of time the operating system requests via its scheduler. The downside to this is that context switches are an 'expensive' CPU operation, that is they take quite a few clock cycles to complete. So its up to the operating system scheduler to preempt a thread when needed, swap its context off the CPU and swap a new thread context on. All this happens transparently to the thread since its context and state is restored perfectly and it has no idea that it was swapped out to let something else run.
In that manner, threads all get roughly the same amount of time to execute on the processor and things appear responsive even though there is a lot going on at the same time. Thread priority can manipulate the amount of time a thread gets on the processor before it is preempted and swapped off to make room for the next one. That's why you can adjust thread priority to make sure your running task gets as much processor time as is needed. The downside is that you can mistakenly give a processor intensive application too much time on the processor before yeilding or being preempted and you get the scenario outlined above when things are very unresponsive until certain threads get a chance on the processor and things can happen.
Now this applies to HyperThreading in a number of ways. Firstly, on Pentium 4, analysis of the processor during the average execution of some set tasks showed that around 35% of the available execution resources on the Netburst core were being used at any one time. This means that the processor is on average only 35% effecient when executing the tasks you give to it. This is caused mainly by unoptimised code not being optimised for the Netburst architecture and its peculiarities compared to other x86 designs. Code doesn't know how to maximise the use of execution units on the processor to get as much work done as possible. So the compiler plays a part in this. Even though the processor is a modern design, on the most part it still has to run old code compiled with unoptimised compilers.
So realising this was the case and the processor was in the main sitting idle and not being used to its full potential, Intel turned to a number of optimising theories that could help it get the most out of its new architecture.
An extension to the superthreading and symmetric multi threading (SMT) school of thinking, HyperThreading at its very core lets more than one thread of execution run on the processor at once. Now given that you know that on a regular non HyperThreading x86 processor there is only one set of everything needed to run x86 code, how can a HyperThreading enabled P4 run more than one thread at once?
|