Intel Makes it Official: Hybrid CPU Cores Arrive With Alder Lake
One of the whispers we’ve heard about Alder Lake since it started making waves in the rumor mill is that this new CPU will offer a mixture of small and large cores. When these rumors initially surfaced (presumably from wherever they’d been milling around), there was some suspicion that Intel might be trying to match AMD’s recent Ryzen 3000 core counts with a combination of high-end Core and low-end Atom CPU cores.
At its Architecture Day this week, Intel confirmed that it would deploy a mixture of small and large cores in its Alder Lake silicon, but the point of the initiative isn’t to try and claim an 8+8 CPU core can match the performance of a 16-core chip. Instead, this is a move Intel is making to improve overall CPU power efficiency.
I don’t think we should count on Lakefield as a strong comparison for Alder Lake, but I wanted to put some context around the conversation. According to Intel, hybridizing Lakefield with a mixture of big and small cores resulted in a superior balance of performance and efficiency than either could achieve independently. The white boxes refer to improvements over the previous generation, while the blue boxes refer to gains compared to a hypothetical chip with only Tremont cores. What Intel is saying is that by combining the two types of CPU it gets better overall results than relying on either alone.
That’s a total repudiation of what Intel thought roughly a decade ago, when the company predicted that a big.Little approach like ARM was taking would prove to be substandard to its own implementation of DVFS (Dynamic Voltage and Frequency Scaling). Intel isn’t the only company that is at least curious about the idea; AMD has filed a patent application for an approach to switching between CPUs based on the current instructions the CPU is asked to execute.
Adding support for these features will require Microsoft to add some advanced scheduling capabilities to Windows that have heretofore been reserved for its ARM OS, though Lakefield requires such capabilities in any case.
In mobile, there’s a straightforward use for these cores — they can reduce power consumption compared with a traditional big core, improving battery life. What good will they prove to be on desktop? I’m genuinely not sure, but I’ve got some ideas. According to Intel, its next-generation Gracemont core will add some type of vector performance capability.
Assume that’s a reference to AVX2, and it means Intel will have a low power core with what ought to be pretty good vector math performance. This is exactly the combination that won AMD’s Jaguar the Xbox One and PS4 SKUs. Pushing mid-level AVX2 workloads into the small cores could clear the larger CPUs for other tasks.
Ok, so why is that potentially helpful?
Because right now, Intel CPUs take a 10-12 percent clock speed hit if they enable AVX2 and ~1.25x if they use AVX-512. The impact is significant enough that developers are advised against lightly ‘seasoning’ code with AVX-512 — if you deploy it in the wrong way, you can actually penalize yourself by reducing your clock speed in other tasks by a larger amount than you gain from using AVX-512 for a small handful of operations.
Of course, making this kind of approach work would require much closer cooperation between OS and CPU than we currently see. During its event, Intel mentioned a hardware-aware scheduler block that Windows would presumably support and that might be used for assigning workloads depending on their execution characteristics. But even if the above scenario is wrong on the particulars, it’s an accurate model of how Intel, AMD, and other chip manufacturers increasingly think about performance. It’s not just a question of which architectural features your CPU supports, but where would it be most advantageous to run a workload given the current ambient conditions inside your specific PC and the various workloads it’s already running?
This sort of hand-in-glove operation is something we’ll achieve in stages rather than in one enormous leap. But to the extent that tighter OS/CPU communication can improve execution efficiency in any context, I’d expect to see chip manufacturers looking for ways to improve it. The old approach of using the same set of cores looks like its on its way out, long term.