Zen 2 is an evolution of first-generation Zen rather than a grounds-up design. This makes implicit sense because AMD made a clean break with Zen compared to the maligned Bulldozer/Excavator core. It is the very architecture that powers Ryzen 3000-series processors.
Evolutionary designs enable engineers to pick off all the low-hanging fruit missed first time around, iron out bottleneck kinks, and then focus on laying down transistors that enhance performance.
Ryzen 3000-series CPUs are designed primarily to boost the all-important instructions per clock cycle (IPC) metric which historically has been lacking on AMD chips when compared directly to Intel. IPC has become more important as liberal increases in frequency have dried up: reliably hitting 5GHz on any modern processor is extremely difficult.
AMD claims Ryzen 3000, whose execution cores are hewn from a leading-edge 7nm process, increases IPC by a full 15 per cent compared to original Zen, which is impressive given the base architecture is familiar, and that's without taking any additional frequency headroom into account.
Improving the front-end - deeper and smarter
At the front end of the architecture, Ryzen 3000 improves upon previous generations in a few key ways. It uses what is known as a TAGE branch predictor that carries a deeper branch history than its predecessor. It also beefs-up the various buffers and makes the micro-op cache bigger. Each of these features improves performance, or IPC, by a per cent or two.
Wider execution and, finally, true AVX256
The architecture also calls for wider issuing capability and fundamentally more performance by adding an extra address generation unit that enables the CPU to more quickly calculate addresses within main memory through greater parallelism. This is a handy feature for almost all applications.
Ryzen 3000-series can now process 256-bit AVX SIMD instructions in one clock cycle, double that of original Ryzen. This is important because it's one area where rival Intel has always enjoyed a decent lead, shown in well-tuned content-creation applications.
As on previous generations, four cores are grouped into what is known as a CCX and each four-core group has access to L3 cache. For Ryzen 3000, the big change is that each CCX's L3 cache is doubled, from 8MB to 16MB, and you'll see the sum of L2 and L3 cache known as GameCache. It is beneficial because having lots of cache mitigates the need to go out to slower memory.
And it's this combination of keeping as much data on-chip as possible, having a smarter front-end, wider execution unit, and enhanced floating-point capability that combine to offer more IPC than on previous iterations of Ryzen. The exact gain is dependent on how diverse workload benefits from each of these performance-adding features - some respond excellently to heaps more cache and associated lower average latency, others to floating-point, but in every case Ryzen 3000 CPUs ought to be faster than previous-generation Ryzens.
There is, however, another manifest reason why AMD has gone L3 cache-heavy. It's to do with how CPUs are constructed. More precisely, the nature of the flexible chiplet design.
Chips, chiplets, PCIe 4
AMD splits the CPU cores and I/O functions into two groups. The reason is the latter doesn't need to be on a leading-edge process - pure performance is less critical here.
The above graphic illustrates how Ryzen 3000's design makes the CCXs independent of the I/O block. It shows a couple of CCX complexes lashed together via Infinity Fabric to create an 8C16T chip complete with 4MB of L2 and 32MB of L3 cache.
This modular CCD is then connected to the I/O chip via a high-speed data fabric. The key takeaway is that, because of its relative simplicity, the I/O block can be produced on an older process. That's exactly the case, as while TSMC is the go-to solution for the CCD silicon, GlobalFoundries' 12nm node provides the silicon backbone for the I/O.
The beauty is that, should AMD want more cores and threads for a particular segment, a second 8C16T CCD can be further added and connected to the I/O via the same high-speed link. This is precisely what happens with the Ryzen 9 line of CPUs, ones that have >8 cores.
The I/O block itself carries the dual-channel memory controller, which has been upgraded and optimised for Ryzen 3000 to run at much higher speeds and offer 16 lanes of PCIe Gen 4 for graphics, four PCIe Gen 4 for NMVe/SATA drives, and a further four same-spec lanes for connecting to the chipset. It also features four USB 3.1 G2 ports, and other associated I/O goodies. Feature-rich, then.
Size matters - the power of 7nm
You would think that having a wider execution core, larger caches in general, and a massive 16MB of L3 would cause each of the CPU's CCX's to balloon in size compared to original Ryzen. That isn't the case because of the wonderful density of 7nm.
Even with the larger design and cache footprint, AMD reckons that each of Ryzen 3000's CCXs, built on 7nm, take up just over half the space of original Ryzen. Through some clever power management and work with TSMC, the EDA tool guys, and modelling, Ryzen 3000-series runs faster than the previous generations, too.
Ryzen 3000-series CPUs are smarter, faster, more parallel, and cache-rich than ever before. They're also blazingly fast. Think performance, think Ryzen 3000 series.