Guts of the beast
The guts of the beast: loop-stream detector and branch prediction enhancementsImproved memory access is a fundamental determinant of performance but it works best if the underlying core engine can take advantage of it.
We know that Core 2's potency comes from having a four-wide decoding engine, including a 32-bit macro-op fusion feature - where two similar instructions are paired together to form one - that potentially adds another. Core i7 keeps the same setup but extends macro-op to 64-bit. What's more, Intel reckons that a greater number of instructions can be paired for macro-op, boosting throughput.
Another difference is in how the Core 2's Loop Stream Detector has been upgraded. The LSD is a piece of logic that looks out for repetitive execution of certain software instructions (loops). Should the software require the CPU to decode a whole bunch of these, over and over again, there's little sense in pushing them all through the pipeline, Rather, the LSD creates a shorter decode path by removing the need for, say, branch prediction (why predict when you know the instruction. On Core 2 the LSD carries 18 x86 instructions and is located between the fetch and decode stages of the pipeline.
Core i7 uses, wait for it, an upgraded LSD that carries 28 x86 (micro-ops, not instructions), meaning that more loops can be detected and navigated away from the full pipeline. What's new here is that the LSD is now situated after decode stage rather than between it and the fetch. Why? Intel's Ronak Singhal reckons it allows the CPU to circumvent more of the pipeline, reducing power and increasing performance.
An effective branch predictor is crucial in keeping multi-core CPUs busy with instructions. Its job is to calculate whether the instruction will require extra information, gleaned from other instructions, before it can be processed - a conditional branch. Should this be the case, an effective branch predictor minimises the need to halt the general instruction flow to the hungry execution units by intelligently guessing what's going to happen. Branch prediction is an ongoing improvement in successive generations from both AMD and Intel, and the latter reckons Core i7's branch predictor has been modified to provide better performance, although actual hard-and-fast numbers weren't being divulged.
The new processor adds in SSE4.2 support with a further seven instructions that, on the face of it, are aimed at the server/workstation market, too.
Core i7 is generally the same as Core 2 when looking at the front-end, then, and most of the improvements tend to revolve around increasing efficiency a touch here and there, with most of the gains looking to come in the server environment (64-bit macro-op and LSD, for example)