Comment by aurareturn

aurareturn 2 days ago parent

Per core, Apple’s Performance cores are no bigger than AMD’s Zen cores. So it’s a myth that they’re only fast and efficient because they are big.

What makes Apple silicon chips big is they bolt on a fast GPU on it. If you include the die of a discrete GPU with an x86 chip, it’d be the same or bigger than M series.

You can look at Intel’s Lunar Lake as an example where it’s physically bigger than an M4 but slower in CPU, GPU, NPU and has way worse efficiency.

Another comparison is AMD Strix Halo. Despite being ~1.5x bigger than the M4 Pro, it has worse efficiency, ST performance, and GPU performance. It does have slightly more MT.

chasil 2 days ago

Is it not true that the instruction decoder is always active on x86, and is quite complex?

Such a decoder is vastly less sophisticated with AArch64.

That is one obvious architectural drawback for power efficiency: a legacy instruction set with variable word length, two FPUs (x87 and SSE), 16-bit compatibility with segmented memory, and hundreds of otherwise unused opcodes.

How much legacy must Apple implement? Non-kernel AArch32 and Thumb2?

Edit: think about it... R4000 was the first 64-bit MIPS in 1991. AMD64 was introduced in 2000.

AArch64 emerged in 2011, and in taking their time, the designers avoided the mistakes made by others.

daeken 2 days ago

There's no AArch32 or Thumb support (A32/T32) on M-series chips. AArch64 (technically A64) is the only supported instruction set. Fun fact: this makes it impossible to run Mario Kart 8 via virtualization on Macs without software translation, since it's A32.

How much that does for efficiency I can't say, but I imagine it helps, especially given just how damn easy it is to decode.

averne_ 2 days ago

It actually doesn't make much difference: https://chipsandcheese.com/i/138977378/decoder-differences-a...

chasil 2 days ago

I had not realized that Apple did not implement any of the 32-bit ARM environment, but that cuts the legs out of this argument in the article:

"In Anandtech’s interview, Jim Keller noted that both x86 and ARM both added features over time as software demands evolved. Both got cleaned up a bit when they went 64-bit, but remain old instruction sets that have seen years of iteration."

I still say that x86 must run two FPUs all the time, and that has to cost some power (AMD must run three - it also has 3dNow).

Intel really couldn't resist adding instructions with each new chip (MMX, PAE for 32-bit, many more on this shorthand list that I don't know), which are now mostly baggage.

theevilsharpie 2 days ago

> I still say that x86 must run two FPUs all the time, and that has to cost some power (AMD must run three - it also has 3dNow).

Legacy floating-point and SIMD instructions exposed by the ISA (and extensions to it) don't have any bearing on how the hardware works internally.

Additionally, AMD processors haven't supported 3DNow! in over a decade -- K10 was the last processor family to support it.

daeken 2 days ago

Oh wow, I need to dig way deeper into this but wonderful resource - thanks!

Fluorescence 2 days ago

> Despite being ~1.5x bigger than the M4 Pro

Where are you getting M4 die sizes from?

It would hardly be surprising given the Max+ 395 has more, and on average, better cores fabbed with 5nm unlike the M4's 3nm. Die size is mostly GPU though.

Looking at some benchmarks:

> slightly more MT.

AMD's multicore passmark score is more than 40% higher.

https://www.cpubenchmark.net/compare/6345vs6403/Apple-M4-Pro...

> worse efficiency

The AMD is an older fab process and does not have P/E cores. What are you measuring?

> worse ST performance

The P/E design choice gives different trade-offs e.g. AMD has much higher average single core perf.

> worse GPU performance

The AMD GPU:

14.8 TFLOPS vs. M4 Pro 9.2 TFLOPS.

19% higher 3D Mark

34% higher GeekBench 6 OpenCL

Although a much crappier Blender score. I wonder what that's about.

https://nanoreview.net/en/gpu-compare/radeon-8060s-vs-apple-...

aurareturn OP 2 days ago

  Where are you getting M4 die sizes from?

M1 Pro is ~250mm2. M4 Pro likely increased in size a bit. So I estimated 300mm2. There are no official measurements but should be directionally correct.

  AMD's multicore passmark score is more than 40% higher.

It's an out of date benchmark that not even AMD endorses and the industry does not use. Meanwhile, AMD officially endorses Cinebench 2024 and Geekbench. Let's use those.

   The AMD is an older fab process and does not have P/E cores. What are you measuring?

Efficiency. Fab process does not account for the 3.65x efficiency deficit. N4 to N3 is roughly ~20-25% more efficient at the same speed.

  The P/E design choice gives different trade-offs e.g. AMD has much higher average single core perf.

Citation needed. Further more, macOS uses P cores for all the important tasks and E cores for background tasks. I fail to see why even if AMD has a higher average ST would translate to better experience for users.

  14.8 TFLOPS vs. M4 Pro 9.2 TFLOPS.

TFLOPs are not the same between architectures.

  19% higher 3D Mark

Equal in 3DMark Wildlife, loses vs M4 Pro in Blender.

  34% higher GeekBench 6 OpenCL

OpenCL has long been deprecated on macOS. 105727 is the score for Metal, which is supported by macOS. 15% faster for M4 Pro.

The GPUs themselves are roughly equal. However, Strix Halo is still a bigger SoC.

vient 2 days ago

> TFLOPs are not the same between architectures.

Shouldn't they be the same if we are speaking about same precision? For example, [0] shows M4 Max 17 TFLOPS FP32 vs MAX+ 395 29.7 TPLOFS FP32 - not sure what exact operation was measured but at least it should be the same operation. Hard to make definitive statements without access to both machines.

[0] https://www.cpu-monkey.com/en/compare_cpu-apple_m4_max_16_cp...

aurareturn OP 2 days ago

M4 Max doesn't even disclose TFLOPS so no clue where that website got the numbers from.

TFLOPS can't be measured the same between generations. For example, Nvidia often quotes sparsity TFLOPS which doubles the dense TFLOPS previously reported. I think AMD probably does the same for consumer GPUs.

Another example is Radeon RX Vega 64 which had 12.7 TFLOPS FP32. Yet, Radeon RX 5700 XT with just 9.8 TFLOPS FP32 absolutely destroyed it in gaming.

Fluorescence 2 days ago

What a waste of time.

"directionally correct"... so you don't know and made up some numbers? Great.

AMD doesn't "endorse benchmarks" especially not fucking Geekbench for multi-core. No-one could because it's famously nonsense for higher core counts. AMD's decade old beef with Sysmark was about pro-Intel bias.

aurareturn OP 2 days ago

  "directionally correct"... so you don't know and made up some numbers? Great.

I never said it was exactly that size. Apple keeps the sizes of their base, Pro, and Max chips fairly consistent over generations.

Welcome to the world of chip discussions. I've never taken apart and M4 Pro computer and measured the die myself. It appears no one has on the internet. However, we can infer a lot of it based on previously known facts. In this case, we know M1 Pro's die size is around 250mm2.

  AMD doesn't "endorse benchmarks" especially not fucking Geekbench for multi-core. No-one could because it's famously nonsense for higher core counts. AMD's decade old beef with Sysmark was about pro-Intel bias.

Geekbench is the main benchmark AMD tends to use: https://videocardz.com/newz/amd-ryzen-5-7600x-has-already-be...

The reason is because Geekbench correlates highly with SPEC, which is the industry standard.

Fluorescence 2 days ago

Their "main benchmark"? Stop making things up. It's no more than tragic fanboy addled fraud at this point.

That three-year old press-release refers to SINGLE CORE Geekbench and not the defective multicore version that doesn't scale with core counts. Given AMD's main USP is core counts it would be an... unusual choice.

AMD marketing uses every other product under the sun too (no doubt whatever gives the better looking numbers)... including Passmark e.g. it's on this Halo Strix page:

https://www.amd.com/en/products/processors/ai-pc-portfolio-l...

So I guess that means Passmark is "endorsed" by AMD too eh? Neat.

3 More Comments →

Hikikomori 2 days ago

Your source is an article based on someone finding a Geekbench result for a just released CPU and you somehow try to say its from AMD itself and its an endorsed benchmark, huh.

aurareturn OP 2 days ago

Those are AMD's marketing slides.

wordofx 2 days ago (dead)

This item has no comments currently.