Less cache is more?
Three-level cache and optimisations in generalCore 2 uses a two-level cache hierachy with each core having access to its own 64KB of L1 cache, split over data and instructions, and a shared L2 of up to 12MB on (dual-die) quad-core models. This works well enough for a dual-core CPU but its effectiveness decreases as cores are added, according to Intel.
Core i7, though, keeps the per-core L1 cache intact, albeit with a few improvements, but adds in a very low latency 256KB of per-core L2, so 1MB for the quad-core part.
Keeping things chugging along without the need to run back to system memory on an increasing basis, a third-level cache is added to the mix. It's shared amongst all the cores in the processor and the size will vary according to the number of cores - quad-core Core i7 parts will have 8MB. What's important is that it's inclusive, meaning that the CPU only needs to query the L3 cache to find out if the required information is present in either L1 or L2. If it isn't then there's no point in wasting time looking in the smaller caches. One downside, we suppose, is that having an inclusive setup can effectively reduce the cache size if the data is replicated in L1/L2 and L3.
Speaking more of memory and access, Core 2's architecture was such that it worked best when memory accesses were aligned alongside what are termed cache lines (64-byte boundaries). If they weren't, as is often the case, the CPU wasted time in loading them. Making matters worse, loading unaligned instructions, even if the access was aligned, caused delay. Core i7 does away with such hindrances and increases efficiency, according to Intel.
Lastly, Core i7 employs a new TLB (translation lookaside buffer). The TLB is a table which stores the memory's virtual addressing (location of actual memory on hard drive, kept in virtual form to increase efficiency) and is constantly updated as the CPU churns through instructions. Part of the TLB is stored on the CPU itself and it quickly references what's required.
Core i7 continues with the Core 2's two-level TLB but increases the size of L1 data store from 16 to 64 entries, and adds in a new low-latency L2 TLB that can handle 512 entries. What you should take away from this is that it's another measure designed to reduce latency.