Such a decoder is vastly less sophisticated with AArch64.
That is one obvious architectural drawback for power efficiency: a legacy instruction set with variable word length, two FPUs (x87 and SSE), 16-bit compatibility with segmented memory, and hundreds of otherwise unused opcodes.
How much legacy must Apple implement? Non-kernel AArch32 and Thumb2?
Edit: think about it... R4000 was the first 64-bit MIPS in 1991. AMD64 was introduced in 2000.
AArch64 emerged in 2011, and in taking their time, the designers avoided the mistakes made by others.
How much that does for efficiency I can't say, but I imagine it helps, especially given just how damn easy it is to decode.
"In Anandtech’s interview, Jim Keller noted that both x86 and ARM both added features over time as software demands evolved. Both got cleaned up a bit when they went 64-bit, but remain old instruction sets that have seen years of iteration."
I still say that x86 must run two FPUs all the time, and that has to cost some power (AMD must run three - it also has 3dNow).
Intel really couldn't resist adding instructions with each new chip (MMX, PAE for 32-bit, many more on this shorthand list that I don't know), which are now mostly baggage.
Legacy floating-point and SIMD instructions exposed by the ISA (and extensions to it) don't have any bearing on how the hardware works internally.
Additionally, AMD processors haven't supported 3DNow! in over a decade -- K10 was the last processor family to support it.
Where are you getting M4 die sizes from?
It would hardly be surprising given the Max+ 395 has more, and on average, better cores fabbed with 5nm unlike the M4's 3nm. Die size is mostly GPU though.
Looking at some benchmarks:
> slightly more MT.
AMD's multicore passmark score is more than 40% higher.
https://www.cpubenchmark.net/compare/6345vs6403/Apple-M4-Pro...
> worse efficiency
The AMD is an older fab process and does not have P/E cores. What are you measuring?
> worse ST performance
The P/E design choice gives different trade-offs e.g. AMD has much higher average single core perf.
> worse GPU performance
The AMD GPU:
14.8 TFLOPS vs. M4 Pro 9.2 TFLOPS.
19% higher 3D Mark
34% higher GeekBench 6 OpenCL
Although a much crappier Blender score. I wonder what that's about.
https://nanoreview.net/en/gpu-compare/radeon-8060s-vs-apple-...
Where are you getting M4 die sizes from?
M1 Pro is ~250mm2. M4 Pro likely increased in size a bit. So I estimated 300mm2. There are no official measurements but should be directionally correct. AMD's multicore passmark score is more than 40% higher.
It's an out of date benchmark that not even AMD endorses and the industry does not use. Meanwhile, AMD officially endorses Cinebench 2024 and Geekbench. Let's use those. The AMD is an older fab process and does not have P/E cores. What are you measuring?
Efficiency. Fab process does not account for the 3.65x efficiency deficit. N4 to N3 is roughly ~20-25% more efficient at the same speed. The P/E design choice gives different trade-offs e.g. AMD has much higher average single core perf.
Citation needed. Further more, macOS uses P cores for all the important tasks and E cores for background tasks. I fail to see why even if AMD has a higher average ST would translate to better experience for users. 14.8 TFLOPS vs. M4 Pro 9.2 TFLOPS.
TFLOPs are not the same between architectures. 19% higher 3D Mark
Equal in 3DMark Wildlife, loses vs M4 Pro in Blender. 34% higher GeekBench 6 OpenCL
OpenCL has long been deprecated on macOS. 105727 is the score for Metal, which is supported by macOS. 15% faster for M4 Pro.The GPUs themselves are roughly equal. However, Strix Halo is still a bigger SoC.
Shouldn't they be the same if we are speaking about same precision? For example, [0] shows M4 Max 17 TFLOPS FP32 vs MAX+ 395 29.7 TPLOFS FP32 - not sure what exact operation was measured but at least it should be the same operation. Hard to make definitive statements without access to both machines.
[0] https://www.cpu-monkey.com/en/compare_cpu-apple_m4_max_16_cp...
TFLOPS can't be measured the same between generations. For example, Nvidia often quotes sparsity TFLOPS which doubles the dense TFLOPS previously reported. I think AMD probably does the same for consumer GPUs.
Another example is Radeon RX Vega 64 which had 12.7 TFLOPS FP32. Yet, Radeon RX 5700 XT with just 9.8 TFLOPS FP32 absolutely destroyed it in gaming.
"directionally correct"... so you don't know and made up some numbers? Great.
AMD doesn't "endorse benchmarks" especially not fucking Geekbench for multi-core. No-one could because it's famously nonsense for higher core counts. AMD's decade old beef with Sysmark was about pro-Intel bias.
"directionally correct"... so you don't know and made up some numbers? Great.
I never said it was exactly that size. Apple keeps the sizes of their base, Pro, and Max chips fairly consistent over generations.Welcome to the world of chip discussions. I've never taken apart and M4 Pro computer and measured the die myself. It appears no one has on the internet. However, we can infer a lot of it based on previously known facts. In this case, we know M1 Pro's die size is around 250mm2.
AMD doesn't "endorse benchmarks" especially not fucking Geekbench for multi-core. No-one could because it's famously nonsense for higher core counts. AMD's decade old beef with Sysmark was about pro-Intel bias.
Geekbench is the main benchmark AMD tends to use: https://videocardz.com/newz/amd-ryzen-5-7600x-has-already-be...The reason is because Geekbench correlates highly with SPEC, which is the industry standard.
That three-year old press-release refers to SINGLE CORE Geekbench and not the defective multicore version that doesn't scale with core counts. Given AMD's main USP is core counts it would be an... unusual choice.
AMD marketing uses every other product under the sun too (no doubt whatever gives the better looking numbers)... including Passmark e.g. it's on this Halo Strix page:
https://www.amd.com/en/products/processors/ai-pc-portfolio-l...
So I guess that means Passmark is "endorsed" by AMD too eh? Neat.
What makes Apple silicon chips big is they bolt on a fast GPU on it. If you include the die of a discrete GPU with an x86 chip, it’d be the same or bigger than M series.
You can look at Intel’s Lunar Lake as an example where it’s physically bigger than an M4 but slower in CPU, GPU, NPU and has way worse efficiency.
Another comparison is AMD Strix Halo. Despite being ~1.5x bigger than the M4 Pro, it has worse efficiency, ST performance, and GPU performance. It does have slightly more MT.