Comment by DuckConference

DuckConference 2 days ago parent

They're big, expensive chips with a focus on power efficiency. AMD and Intel's chips that are on the big and expensive side tend toward being optimized for higher power ranges, so they don't compete well on efficiency, while their more power efficient chips tend toward being optimized for size/cost.

If you're willing to spend a bunch of die area (which directly translates into cost) you can get good numbers on the other two legs of the Power-Performance-Area triangle. The issue is that the market position of Apple's competitors is such that it doesn't make as much sense for them to make such big and expensive chips (particularly CPU cores) in a mobile-friendly power envelope.

aurareturn 2 days ago

Per core, Apple’s Performance cores are no bigger than AMD’s Zen cores. So it’s a myth that they’re only fast and efficient because they are big.

What makes Apple silicon chips big is they bolt on a fast GPU on it. If you include the die of a discrete GPU with an x86 chip, it’d be the same or bigger than M series.

You can look at Intel’s Lunar Lake as an example where it’s physically bigger than an M4 but slower in CPU, GPU, NPU and has way worse efficiency.

Another comparison is AMD Strix Halo. Despite being ~1.5x bigger than the M4 Pro, it has worse efficiency, ST performance, and GPU performance. It does have slightly more MT.

chasil 2 days ago

Is it not true that the instruction decoder is always active on x86, and is quite complex?

Such a decoder is vastly less sophisticated with AArch64.

That is one obvious architectural drawback for power efficiency: a legacy instruction set with variable word length, two FPUs (x87 and SSE), 16-bit compatibility with segmented memory, and hundreds of otherwise unused opcodes.

How much legacy must Apple implement? Non-kernel AArch32 and Thumb2?

Edit: think about it... R4000 was the first 64-bit MIPS in 1991. AMD64 was introduced in 2000.

AArch64 emerged in 2011, and in taking their time, the designers avoided the mistakes made by others.

daeken 2 days ago

There's no AArch32 or Thumb support (A32/T32) on M-series chips. AArch64 (technically A64) is the only supported instruction set. Fun fact: this makes it impossible to run Mario Kart 8 via virtualization on Macs without software translation, since it's A32.

How much that does for efficiency I can't say, but I imagine it helps, especially given just how damn easy it is to decode.

averne_ 2 days ago

It actually doesn't make much difference: https://chipsandcheese.com/i/138977378/decoder-differences-a...

chasil 2 days ago

I had not realized that Apple did not implement any of the 32-bit ARM environment, but that cuts the legs out of this argument in the article:

"In Anandtech’s interview, Jim Keller noted that both x86 and ARM both added features over time as software demands evolved. Both got cleaned up a bit when they went 64-bit, but remain old instruction sets that have seen years of iteration."

I still say that x86 must run two FPUs all the time, and that has to cost some power (AMD must run three - it also has 3dNow).

Intel really couldn't resist adding instructions with each new chip (MMX, PAE for 32-bit, many more on this shorthand list that I don't know), which are now mostly baggage.

theevilsharpie 2 days ago

> I still say that x86 must run two FPUs all the time, and that has to cost some power (AMD must run three - it also has 3dNow).

Legacy floating-point and SIMD instructions exposed by the ISA (and extensions to it) don't have any bearing on how the hardware works internally.

Additionally, AMD processors haven't supported 3DNow! in over a decade -- K10 was the last processor family to support it.

daeken 2 days ago

Oh wow, I need to dig way deeper into this but wonderful resource - thanks!

Fluorescence 2 days ago

> Despite being ~1.5x bigger than the M4 Pro

Where are you getting M4 die sizes from?

It would hardly be surprising given the Max+ 395 has more, and on average, better cores fabbed with 5nm unlike the M4's 3nm. Die size is mostly GPU though.

Looking at some benchmarks:

> slightly more MT.

AMD's multicore passmark score is more than 40% higher.

https://www.cpubenchmark.net/compare/6345vs6403/Apple-M4-Pro...

> worse efficiency

The AMD is an older fab process and does not have P/E cores. What are you measuring?

> worse ST performance

The P/E design choice gives different trade-offs e.g. AMD has much higher average single core perf.

> worse GPU performance

The AMD GPU:

14.8 TFLOPS vs. M4 Pro 9.2 TFLOPS.

19% higher 3D Mark

34% higher GeekBench 6 OpenCL

Although a much crappier Blender score. I wonder what that's about.

https://nanoreview.net/en/gpu-compare/radeon-8060s-vs-apple-...

aurareturn 2 days ago

  Where are you getting M4 die sizes from?

M1 Pro is ~250mm2. M4 Pro likely increased in size a bit. So I estimated 300mm2. There are no official measurements but should be directionally correct.

  AMD's multicore passmark score is more than 40% higher.

It's an out of date benchmark that not even AMD endorses and the industry does not use. Meanwhile, AMD officially endorses Cinebench 2024 and Geekbench. Let's use those.

   The AMD is an older fab process and does not have P/E cores. What are you measuring?

Efficiency. Fab process does not account for the 3.65x efficiency deficit. N4 to N3 is roughly ~20-25% more efficient at the same speed.

  The P/E design choice gives different trade-offs e.g. AMD has much higher average single core perf.

Citation needed. Further more, macOS uses P cores for all the important tasks and E cores for background tasks. I fail to see why even if AMD has a higher average ST would translate to better experience for users.

  14.8 TFLOPS vs. M4 Pro 9.2 TFLOPS.

TFLOPs are not the same between architectures.

  19% higher 3D Mark

Equal in 3DMark Wildlife, loses vs M4 Pro in Blender.

  34% higher GeekBench 6 OpenCL

OpenCL has long been deprecated on macOS. 105727 is the score for Metal, which is supported by macOS. 15% faster for M4 Pro.

The GPUs themselves are roughly equal. However, Strix Halo is still a bigger SoC.

vient 2 days ago

> TFLOPs are not the same between architectures.

Shouldn't they be the same if we are speaking about same precision? For example, [0] shows M4 Max 17 TFLOPS FP32 vs MAX+ 395 29.7 TPLOFS FP32 - not sure what exact operation was measured but at least it should be the same operation. Hard to make definitive statements without access to both machines.

[0] https://www.cpu-monkey.com/en/compare_cpu-apple_m4_max_16_cp...

aurareturn 2 days ago

M4 Max doesn't even disclose TFLOPS so no clue where that website got the numbers from.

TFLOPS can't be measured the same between generations. For example, Nvidia often quotes sparsity TFLOPS which doubles the dense TFLOPS previously reported. I think AMD probably does the same for consumer GPUs.

Another example is Radeon RX Vega 64 which had 12.7 TFLOPS FP32. Yet, Radeon RX 5700 XT with just 9.8 TFLOPS FP32 absolutely destroyed it in gaming.

Fluorescence 2 days ago

What a waste of time.

"directionally correct"... so you don't know and made up some numbers? Great.

AMD doesn't "endorse benchmarks" especially not fucking Geekbench for multi-core. No-one could because it's famously nonsense for higher core counts. AMD's decade old beef with Sysmark was about pro-Intel bias.

aurareturn 2 days ago

  "directionally correct"... so you don't know and made up some numbers? Great.

I never said it was exactly that size. Apple keeps the sizes of their base, Pro, and Max chips fairly consistent over generations.

Welcome to the world of chip discussions. I've never taken apart and M4 Pro computer and measured the die myself. It appears no one has on the internet. However, we can infer a lot of it based on previously known facts. In this case, we know M1 Pro's die size is around 250mm2.

  AMD doesn't "endorse benchmarks" especially not fucking Geekbench for multi-core. No-one could because it's famously nonsense for higher core counts. AMD's decade old beef with Sysmark was about pro-Intel bias.

Geekbench is the main benchmark AMD tends to use: https://videocardz.com/newz/amd-ryzen-5-7600x-has-already-be...

The reason is because Geekbench correlates highly with SPEC, which is the industry standard.

6 More Comments →

wordofx 2 days ago (dead)

This item has no comments currently.