The transition to Apple silicon Arm-based computers

Article By : Brian Dipert

Apple announced its overall strategy for them this summer, and they're finally here: the company's first Apple silicon Arm-based systems.

Back in June I told you it was going to happen, and that Apple claimed it would begin before year-end. It didn’t get announced in mid-October, but the analyst briefing alongside Apple’s subsequent earnings announcement dropped clues that it was still queued for near-term release. And earlier today (as I’m writing these words), it happened. What am I talking about? Apple announced its first three Apple silicon-based computers, along with details on the processor that powers all of them.


Why is Apple M1 Processor Passing on the Chiplets?

Apple’s First in-House SoC for Macs Projected to Propel MacBook Shipment to 17.1M Units in 2021

See for yourself:

John Hodgman, who played “PC Guy” in a popular series of “Get a Mac” promotions from Apple which ran in the late 2000s, even made a repeat performance:

Let’s begin with the “M1” SoC at the heart of all three computers:

Apple illustration of the M1 8-core CPUSource: Apple

For those of you keeping track, it launches a new “prefix” for the company; Axx is the best-known of the existing product line variants, representing application processors for iPhones, iPads, and the iPod Touch. And it’s a monster of a 5 nm-fabricated chip, containing 16 billion transistors, up from 11.8 billion on the A14 SoC that powers Apple’s various iPhone 12 flavors. Don’t pay too much attention to the relative performance claims in the above graphic, or the relative power consumption claims in this next one:

Apple chart of M1 CPU powerSource: Apple

Eagle-eye critics have already pointed out that the comparisons weren’t necessarily “apples to apples” (pun intended), and as I’ll also note shortly, “your mileage may vary.” Nonetheless, it’s quite a design accomplishment.

The path that led to this latest IC is already well-trod to a degree; with the earlier A12-to-A12X/Z evolution, Apple also went from a “two-big/four-little” core arrangement to a fuller “four-and-four” core combination (this time around, perhaps obviously, the CPU cores themselves are more architecturally advanced). And furthering the analogy, the fundamental difference between the A12X and A12Z (as I noted last month) involves the fact that all eight GPU cores are active on the latter (versus only seven on the former); as you’ll soon see, Apple’s done this yield-maximization trick again with the M1.

As you read through the specifications on the SoC and the systems based on it, you’ll find one notable omission: no mention whatsoever of clock speeds, either nominal or “turbo.” Followers of Apple’s A-series SoCs won’t be surprised at this, but it’s a notable departure from the norm in the computer world. That said, it’s only a matter of time until the first wave of owners run (and publish) their own tests, and we actually already have a clue. Last week, benchmark results for something called the “A14X Bionic” showed up on GeekBench. They reference a processor with a 1.80 GHz base clock speed, capable of 3.10 GHz turbo boost, with 8 (four-and-four, again, remember) CPU cores and 8 GPU cores, and running in a system with 8 GBytes of DRAM.

Other notable aspects of the A14 to A14X … err … M1 evolution include per-core L1 cache growth from 128 Kbytes (data and code, each) to 192 Kbytes (ditto); the unified L2 cache size is unchanged at 8 Mbytes (a bit surprising considering the total core-count growth, but transistor count tradeoffs needed to be made, I guess). And, as the image below shows, just as with A-series SoCs, the DRAM is alongside the processor die in a unified-package setup:

photo of the Apple M1 chipSource: Apple

Again, I’m a bit surprised. In a smartphone, where system board real estate is scant and precious, it makes complete sense. And I get that there may be performance (propagation delay, to be precise) and thermal benefits to the approach, especially given Apple’s UMA (unified memory architecture) arrangement that merges system and graphics memory allocations. But as we all quickly learned (thanks to pre-order details) shortly after the event’s conclusion, it also means that systems based on the M1 are only spec’d for up to 16 GBytes of system memory; one of the reasons I suspect Intel-based siblings still exist (for how long, who knows) is that larger-system-memory configurations are possible when the DRAM is discrete.

One other note on the processor before moving on. Much ink has been spilled over the years on the whole RISC (Arm and its ilk) vs CISC (x86, predominantly) tradeoffs. I’ll note, to begin, that PowerPC was supposedly a RISC architecture, but that didn’t work out well, at least in Apple’s case. I’ll also point out that due to instruction set growth over the years, the “R” in RISC can’t credibly be attributed to “reduced number of instructions” anymore (and anyway, as the above die shot shows, each CPU core, even the “big” ones, takes up a small percentage of total die area, plus on modern lithographies the total transistor budget is massive).

However, in reading through AnandTech’s as-usual thorough dissection of the M1, which I heartily commend to your attention, I was particularly struck by the following statement:

Other contemporary designs such as AMD’s Zen(1 through 3) and Intel’s µarch’s, x86 CPUs today still only feature a 4-wide decoder designs that is seemingly limited from going wider at this point in time due to the ISA’s inherent variable instruction length nature, making designing decoders that are able to deal with aspect of the architecture more difficult compared to the ARM ISA’s fixed-length instructions.

More generally, briefly summarizing AnandTech’s detailed prose, RISC’s (Arm’s, at least) instruction set length predictability has enabled Apple to create an incredibly wide-and-deep microarchitecture. The M1’s “big” core design, in contrast to the above, is believed to contain an 8-wide instruction decode block at the front end. Similarly, again quoting from AnandTech’s coverage:

One aspect of recent Apple designs which we were never really able to answer concretely is how deep their out-of-order execution capabilities are. The last official resource we had on the matter was a 192 figure for the ROB (Re-order Buffer) inside of the 2013 Cyclone design. Thanks again to Veedrac’s implementation of a test that appears to expose this part of the µarch, we can seemingly confirm that Firestorm’s ROB is in the 630 instruction range deep, which had been an upgrade from last year’s A13 Lightning core which is measured in at 560 instructions … An out-of-order window is the amount of instructions that a core can have “parked”, waiting for execution in, well, out of order sequence, whilst the core is trying to fetch and execute the dependencies of each instruction.

A +-630 deep ROB is an immensely huge out-of-order window for Apple’s new core, as it vastly outclasses any other design in the industry. Intel’s Sunny Cove and Willow Cove cores are the second-most “deep” OOO designs out there with a 352 ROB structure, while AMD’s newest Zen3 core makes due with 256 entries, and recent Arm designs such as the Cortex-X1 feature a 224 structure.

And suffice it to say that M1’s parallel integer, floating point, and vector instruction execution units are also impressive in number, thanks in no small part to the aforementioned instruction set length predictability. Couple that with a beefy per-core L1 cache architecture and tightly coupled shared L2 cache and main memory, and you can see why it (seemingly, at least on paper) runs rings around alternative Arm- and other-ISA-processor implementations.

Now for the systems themselves:

photo of the Apple MacBook Air, MacBook Pro, and Mac miniSource: Apple

Without patting myself too much on the back, I’m going to requote something I wrote back in January 2019, when I first editorially explored the possibility of Apple moving its product line to Arm “en masse:”

I’m still not convinced that Apple’s planning on doing a slam-dunk conversion from Intel to its own processors in computers any time soon. But a more gradual consumer “pull”-based transition, beginning with systems that value improved battery life, smaller form factor, and lighter weight over absolute performance, is definitely likely.

Look at the above image and you’ll see the realization of that forecast; left-to-right are the new Arm-based 13” MacBook Air, 13” MacBook Pro, and Mac mini (the latter which nobody seemed to forecast in advance, but which the company had “telegraphed” via the A12Z-based “Developer Transition Kit”). Aside from a few I/O and keyboard tweaks, they look pretty much the same as their x86-based precursors, although Apple of course claims that they’ll run rings around their predecessors from both performance and power-consumption perspectives. On that note, two fundamental differences between the MacBook Air and MacBook Pro are:

  • The base variant of the Air uses a 7-core GPU version of the M1, while the Pro’s GPU is fully 8-core-enabled
  • The Air is fanless, while the Pro has active thermal management (i.e. a fan), presumably enabling higher nominal and/or “turbo” peak clock speeds in the latter case (again, not published, so difficult to ascertain for certain until systems are in customers’ hands)

The only form factor that at this early point in the transition has completely converted from x86 to Apple silicon is the MacBook Air. And you don’t yet see any hint of higher-end systems—the 16” MacBook Pro, for example, or the iMac or Mac Pro, even beginning to migrate. Why? Some of it has to do with hardware: professionals, for example, are going to expect higher-end graphics performance than Apple’s current integrated GPU core can deliver, whether this comes about via beefier Apple-designed graphics in the future, or via discrete and/or re-enabled external GPUs. Multi-CPU configurations to further boost the effective processor core count will also be desirable for multi-threaded application support (think video rendering, for example). I’ll even go out on a limb and guesstimate that we’ll continue to see new x86-based system announcements at the higher end of the line for a while yet (forever? Maybe a stretch).

And speaking of video rendering, the other key factor defining the high-end platform transition to Apple silicon, and the overall product line migration more generally, is software. Right now (until this week, to be precise) all of the MacOS applications as compiled are x86-only. Just as with the earlier PowerPC-to-x86 shift, Apple and its developer partners will navigate this particular CPU architecture conversion via a combination of:

  • “Universal” applications that are compiled to run on both CPU instruction sets, and
  • Rosetta 2,” an emulation layer that allows x86-compiled code to run on Arm-based hardware, albeit with degraded performance and power consumption due to the inherent inefficiency of the approach.

This time, there’s a further “twist;” since iOS apps are also (and already) Arm-compiled, an extension of the “Universal” concept will allow them to also run on Big Sur MacOS 11.0 (the Apple silicon version of the new O/S, to be precise, which is also “going gold” this week), but only if developers support it. And via something called Project Catalyst, which had been announced at the mid-2019 WWDC (as you can see, the company has been laying the groundwork for this CPU transition for a while), you can even develop a common-UI app for both the iPad and Mac (although the new Macs still aren’t touch interface-supportive, which is a bit surprising to me). But so far, at least, developers have been cautious (I’m being kind in using that particular word) at embracing these concepts.

More generally, although Apple’s own Mac apps are unsurprisingly already “Universal,” uptake from notable developer partners seems likely to be slow. Apple’s Final Pro video editor is ready to go, for example, but subsequent to the version “X” launch back in 2011, it lost a lot of professionals to alternatives such as Adobe Premiere. But an Arm version of Adobe Premiere is nowhere in sight; more generally, Adobe won’t begin its Apple silicon shift until next month with Lightroom(“Universal” Adobe Photoshop, similarly, is MIA at the moment). Similarly, it took more than 2.5 years after Apple’s announcement of its migration to x86 for a “Universal” version of the Microsoft Office suite for Mac to appear. And as I know well from personal experience, running PowerPC-compiled Office on an Intel-based Mac via Rosetta emulation was slow and went through batteries like crazy. How long will it take for Microsoft to roll out an Arm-compiled Office suite? And might, heaven forbid, the company intentionally drag its feet?

Apple can talk all it wants about apps supposedly running faster emulated on Arm-based hardware than natively on Intel-based hardware, and I suppose the claim might be true in the one-off case, but for the bulk of situations, no matter that I found Rosetta to be functionally robust (and expect Rosetta 2 to be the same), I’ve got to believe that the emulated experience will be subpar. Not to mention the fact that the Arm-compiled application upgrades to come will inevitable be accompanied by upgrade price tags. Heck, right now I’m even resisting the migration to the newest 64-bit-only MacOS versions because the 32-bit applications (Adobe Creative Suite as a case study) that I’ve already paid for still work perfectly well for me.

I’m sure I’ll have more to say about Apple’s current status and future outlook in upcoming blog posts, along with the similar situation that Microsoft is experiencing with its Arm-based Surface Pro X product line, but I’ve just crossed through 2,000 words so I’ll stop for now. 😉 Sound off with your thoughts in the comments!

This article was originally published on EDN.

Brian Dipert is Editor-in-Chief of the Edge AI and Vision Alliance, and a Senior Analyst at BDTI and Editor-in-Chief of InsideDSP, the company’s online newsletter.

Related articles:

Leave a comment