Apple spent years incrementally improving efficiency and performance of their chips for phones. Intel and AMD were more desktop based so power efficiency wasnt the goal. When Apple's chips got so good they could transition into laptops, x86 wasn't in the same ballpark.
Also the iPhone is the most lucrative product of all time (I think) and Apple poured a tonne of that money into R&D and taking the top engineers from Intel, AMD, and ARM, building one of the best silicon teams.
NeXT? But yes, I completely get what you’re saying, I just couldn’t resist. It was an amazingly long sighted strategic move, for sure.
MeeGo proceeded far too slowly and Elop chose his former employers' Windows instead in 2011. Nokia's decline only increased and Intel hired many Nokia engineers.
Soon Nokia made no phone anymore and Intel did not even manage to make their first mass-selling product.
ARM-based SoCs were 10 years ahead in power saving. The ARM ecosystem did not make any fatal mistakes, Intel never caught up.
To me the Atom team always felt like a dead-end inside intel - everyone seemed to be trying to get in to a different higher-status team ASAP - our engineering contacts often changed monthly, if we even knew who our "contacts" were meant to be at any time. I think any product developed like that would struggle.
how much silicon did Apple actually create? I thought they outsourced all the components?
I remember iPaq PDA fondly. Wrote a demo to select a song from a playlist with few thousand author-album-song with voice query. The WiFi add-on was a big plastic "sleeve", that the iPaq slid into, not the other way around. Could run the ASR engine for about whole 10 mins before it drained the battery flat, haha. :-)
The Newton was long before the iPaq, the MessagePad was released in 1993.
Q> And latter - was it that it looked that Apple maybe in danger of going under, and then they sold their ARM stake and got a cash injection that way?
A> And yes. In the late-1990s turnaround, Apple sold down its ARM stake in multiple tranches after ARM’s 1998 IPO, realizing hundreds of millions of dollars that helped shore up finances (alongside the well-known $150 million Microsoft deal in Aug 1997).
CPU, GPU, neural processor, image signal processor, U1 chip for device tracking, Secure Enclave for biometrics, a 5G modem (only used in the 16e so far)…
They don’t manufacture the chips in house of course. They contract that out to TSMC and other companies.
See Lunar Lake on TSMC N3B, 4+4, on-package DRAM versus the M3 on TSMC N3B, 4+4, on-package DRAM: https://youtu.be/ymoiWv9BF7Q?t=531
The 258V (TSMC N3B) has a worse perf / W 1T curve than the Apple M1 (TSMC N5).
Also, there's the obvious benefits of being TSMC's best customer. And when you design a chip for low power consumption, that means you've got a higher ceiling when you introduce cooling.
No, the main reason for better battery life is the RISC architecture. PC on ARM architecture has the same gains.
Im not wrong!
https://chipsandcheese.com/p/arm-or-x86-isa-doesnt-matter
https://chipsandcheese.com/p/why-x86-doesnt-need-to-die
All instructions across x86 and Arm are being decoded to micro-operations, which are implementation specific. You could have an implementation which prioritizes performance, or an implementation that prioritizes power consumption, regardless of the ISA.
Decoding instructions, particularly on a modern die, doesn’t consume a significant amount of area or power, even for complicated variable length instructions.
If all that's true then why does Snapdragon have better battery life? As I said in my comment the great battery life comes from when the CPU isn't being used. It's everything else around it. That's where AMD is still significantly behind.
Apple is vertically integrated and can optimize at the OS and for many applications they ship with the device.
Compare that to how many cooks are in the kitchen in Wintel land. Perfect example is trying to get to the bottom of why your windows laptop won't go to sleep and cooks itself in your backpack. Unless something's changed, last I checked it was a circular firing squad between laptop manufacturer, Microsoft and various hardware vendors all blaming each other.
> Compare that to how many cooks are in the kitchen in Wintel land. Perfect example is trying to get to the bottom of why your windows laptop won't go to sleep and cooks itself in your backpack
So, I was thinking like this as well, and after I lost my Carbon X1 I felt adventurous, but not too adventurous, and wanted a laptop that "could just work". The thinking was "If Microsoft makes both the hardware and the software, it has to work perfectly fine, right?", so I bit my lip and got a Surface Pro 8.
What a horrible laptop that was, even while I was trialing just running Windows on it. Overheated almost immediately by itself, just idling, and STILL suffers from the issue where the laptop sometimes wake itself while in my backpack, so when I actually needed it, of course it was hot and without battery. I've owned a lot of shit laptops through the years, even some without keys in the keyboard, back when I was dirt-poor, but the Surface Pro 8 is the worst of them all, I regret buying it a lot.
I guess my point is that just because Apple seem really good at the whole "vertically integrated" concept, it isn't magic by itself, and Microsoft continues to fuck up the very same thing, even though they control the entire stack, so you'll still end up with backpack laptops turning themselves on/not turning off properly.
I'd wager you could let Microsoft own every piece of physical material in the world, and they'd still not be able to make a decent laptop.
That's why Apple is good at making a whole single system that works by itself, and Microsoft is good at making a system that works with almost everything almost everyone has made almost ever.
The 2nd biggest disappointment was when I ran my team's compute-heavy workload locally, expecting blistering performance from the i9, only to find that the CPU got throttled to under 50% (I seem to recall 47%, but my memory is fuzzy), within 6 seconds of starting the workload. And this was essentially a brand new laptop, so it likely wasn't blocked fan intakes. I fail to see the point of putting a CPU in a laptop that your thermal design simply can't handle.
> Perfect example is trying to get to the bottom of why your windows laptop won't go to sleep and cooks itself in your backpack
Same thing happens in Apple land: https://www.hackerneue.com/item?id=44745897. My Framework 16 hasn't had this issue, although the battery does deplete slowly due to shitty modern standby.
Apple has this. It's called Power Nap. But for some reason, it doesn't cause the same problems reported by people here on HN.
> Framework 16
> The 2nd Gen Keyboard retains the same hardware as the 1st Gen but introduces refreshed artwork and updated firmware, which includes a fix to prevent the system from waking while carried in a bag.
All that to say: M1 is pretty fast, but the reason the battery life is better has to do with everything other than the CPU cores. That's what AMD and Intel are missing.
This isn't true. Yes, uncore power consumption is very important but so is CPU load efficiency. The faster the CPU can finish a task, the faster it can go back to sleep, aka race to sleep.Apple Silicon is 2-4x more efficient than AMD and Intel CPUs during load while also having higher top end speed.
Another thing that makes Apple laptops feel way more efficient is that they use a true big.Little design while AMD and Intel's little cores are actually designed for area efficiency and not power efficiency. In the case of Intel, they stuff as many little cores as possible to win MT benchmarks. In real world applications, the little cores are next to useless because most applications prefer a few fast cores over many slow cores.
This is false, in cross platform tasks it's on par if not worse than latest X86 arches. As others pointed out: 2.5h in gaming is about what you'd expect from a similarly built X86 machine.
They are willing due to lower idle and low load consumption, which they achieve by integrating everything as much as possible - something that's basically impossible for AMD and Intel.
> The faster the CPU can finish a task, the faster it can go back to sleep, aka race to sleep.
May have been true when CPU manufacturers left a ton of headroom on the V/F curve, but not really true anymore. Zen 4 core's power draw shoots up sharply pass 4.6 GHz and nearly triples when you approach 5.5 GHz (compared to 4.6), are you gonna complete the task 3 times faster at 5.5 GHz?
This is false, in cross platform tasks it's on par if not worse than latest X86 arches.
This is Cinebench 2024, a cross platform application: https://imgur.com/a/yvpEpKF They are willing due to lower idle and low load consumption, which they achieve by integrating everything as much as possible - something that's basically impossible for AMD and Intel.
Weird because LNL achieved similar idle wattage as Apple Silicon.[0] Why do you say it's impossible? May have been true when CPU manufacturers left a ton of headroom on the V/F curve, but not really true anymore. Zen 4 core's power draw shoots up sharply pass 4.6 GHz and nearly triples when you approach 5.5 GHz (compared to 4.6), are you gonna complete the task 3 times faster at 5.5 GHz?
Honestly not sure how your statement is relevant.[0]https://www.notebookcheck.net/Dell-XPS-13-9350-laptop-review...
You sure like that table, don't you? Trying to find the source of that blender numbers, I came across many reddit posts of you with that exact same table. Sadly those also don't have a source - the are not from the notebookcheck source.
For Blender numbers, M4 Pro numbers came from Max Tech's review.[0] I don't remember where I got the Strix Halo numbers from. Could have been from another Youtube video or some old Notebookcheck article.
Anyway, Blender has official GPU benchmark numbers now:
M4 Pro: 2497 [1]
Strix Halo: 1304 [2]
So M4 Pro is roughly 90% faster in the latest Blender. The most likely reason for why Blender's official numbers favors M4 Pro even more is because of more recent optimizations.
Sources:
[0]https://youtu.be/0aLg_a9yrZk?si=NKcx3cl0NVdn4bwk&t=325
[1] https://opendata.blender.org/devices/Apple%20M4%20Pro%20(GPU...
[2] https://opendata.blender.org/devices/AMD%20Radeon%208060S%20...
And where is LNL now? How's the company that produced it? Even under Pat Gelsinger they said that LNL is a one off and they're not gonna make any more of them. It's commercially infeasible.
> Honestly not sure how your statement is relevant.
How is you bringing up synthetics relevant to race to idle?
Regardless, a number of things can be done on Strix Halo to improve the performance, first would be switching to some optimized Linux distro, or at least the kernel. That would claw back 5-20% depending on the task. It would also improve single core efficiency, I've seen my 7945hx drop from 14-15w idle on Windows to about 7-8 on Linux, because Windows likes to jerk off the CCDs non stop and throw the tasks around willy nilly which causes the second CCD and I/O die to never properly idle.
And where is LNL now? How's the company that produced it? Even under Pat Gelsinger they said that LNL is a one off and they're not gonna make any more of them. It's commercially infeasible.
Why does it matter that LNL is bad economically? LNL shows that it's definitely possible to achieve same idle or even better idle wattage than Apple Silicon. How is you bringing up synthetics relevant to race to idle?
I truly don't understand what you mean.Cool, now compare M1 to AI 340. The AI 340 has slightly better single core and better multi-core. If battery life was all about race to idle like you claim then the AI 340 should be better than the M1.
See also Snapdragon X Elite, which is significantly slower than the AI 340, uses more power under load, so in total has much less efficient cores, and yet still beats the AI 340 on battery life.
This is not true. For high-throughput server software x86 is significantly more efficient than Apple Silicon. Apple Silicon optimizes for idle states and x86 optimizes for throughput, which assumes very different use cases. One of the challenges for using x86 in laptops is that the microarchitectures are server-optimized at their heart.
ARM in general does not have the top-end performance of x86 if you are doing any kind of performance engineering. I don't think that is controversial. I'd still much rather have Apple Silicon in my laptop.
For high-throughput server software x86 is significantly more efficient than Apple Silicon.
In the server space, x86 has the highest performance right now. Yes. That's true. That's also because Apple does not make server parts. Look for Qualcomm to try to win the server performance crown in the next few years with their Oryon cores.That said, Graviton is at least 50% of all AWS deployments now. So it's winning vs x86.
ARM in general does not have the top-end performance of x86 if you are doing any kind of performance engineering. I don't think that is controversial.
I think you'll have to define what top-end means and what performance engineering means.But I am purely guessing ARM has risen their price per core so it makes less financial sense to do a yearly update on CPU. They are also going into Server CPU business meaning they now have some incentives to keep it all to themselves. Which makes the Nvidia moves really smart as they decided to go for the ISA licences and do it by themselves.
I've worked in video delivery for quite a while.
If I were to write the law, decision-makers wilfully forcing software video decoding where hardware is available would be made to sit on these CPUs with their bare buttocks. If that sounds inhumane, then yes, this is the harm they're bringing upon their users, and maybe it's time to stop turning the other cheek.
Are you telling me that for some reason it's not using any hardware acceleration available while watching YouTube? How do I fix it?
A good demonstration is the Android kernel. By far the biggest difference between it and the stock Linux kernel is power management. Many subsystems down to the process scheduler are modified and tuned to improve battery life.
What are some examples of power draw savings that Linux is leaving on the table?
“Modern Standby” could be made to actually work, ACPI states could be fixed, a functional wake-up state built anew, etc. Hell, while it would allow pared down CPUs, you could have a stop-gap where run mode was customized in firmware.
Too much credit is given to Apple for “owning the stack” and too little attention to legacy x86 cruft that allows you to run classic Doom and Commander Keen on modern machines.
Where do you get this from? I could understand that they could get rid of the die area devoted to x86 decoding, but as I understand it x86 and x86-64 instructions get interpreted by the same execution units, which are bitness blind. What makes you think it's x86 support that's responsible for the vast majority of power inefficiency in x86-64 processors?
Reduced I-Cache, uop cache, and decoder pressure would also have a beneficial impact. On the flip side, APX instructions would all be an entire byte longer than their AMD64 counterparts, so some of the benefits would be more muted than they might first appear and optimizing between 16 registers and shorter instructions vs 32 registers with longer instructions is yet another tradeoff for compilers to make (and takes another step down the path of being completely unoptimizable by humans).
I'm confused, how is any of this related to "x86" and not the diverse array of third party hardware and software built with varying degrees of competence?
To be fair, usually the linux itself has hardware acceleration available but the browser vendors tend to disable gpu rendering except on controlled/known perfectly working combinations of OS/Hardware/Drivers and they have much less testing in Linux. In most case you can force enabling gpu rendering in about:config and try it out yourself and leave it unless you get recurring crashes.
All the Blink-based ones just work as long the proper libraries are installed and said libraries properly detect hardware support.
If you manually go in and limit a modern Windows laptop's max performance to just under what the spec sheet indicates, it'll be fairly quiet and cool. In fact, most have a setting to do this, but it's rarely on by default because the manufacturers want to show off performance benchmarks. Of course, that's while also touting battery life that is not possible when in the mode that allows the best performance...
This doesn't cover other stupid battery life eaters like Modern Standy (it's still possible to disable it with registry tweaks! do it!), but if you don't need absolute max perf for renders or compiling or whatever, put your Windows or Linux laptop into "cool & quiet" mode and enjoy some decent extra battery.
It would also be really interesting to see what Apple Silicon could do under some Extreme OverClocking fun with sub-zero cooling or such. Would require a firmware & OS that allows more tuning and tweaking, so it's not going to happen anytime soon, but could actually be a nice brag for Apple it they did let it happen.
Incredible discipline. The Chrome graph in comparison was a mess.
Looks like general purpose CPUs are on the losing train.
Maybe Intel should invent desktop+mobile OS and design bespoke chips for those.
I assume this is referring to the tweet from the launch of the M1 showing off that retaining and releasing an NSObject is like 3x faster. That's more of a general case of the ARM ISA being a better fit for modern software than x86, not some specific optimization for Apple's software.
x86 was designed long before desktops had multi-core processors and out-of-order execution, so for backwards compatibility reasons the architecture severely restricts how the processor is allowed to reorder memory operations. ARM was designed later, and requires software to explicitly request synchronization of memory operations where it's needed, which is much more performant and a closer match for the expectations of modern software, particularly post-C/C++11 (which have a weak memory model at the language level).
Reference counting operations are simple atomic increments and decrements, and when your software uses these operations heavily (like Apple's does), it can benefit significantly from running on hardware with a weak memory model.
It's not really even the ISA, mainly the implementation. Atomics on Apple cores are 3x faster than Intel (18 cycles back to back latency vs 6). AMD's atomics have 6 cycle latency.
It seems if you want optimal performance and power efficiency, you need to own both hardware and software.
Does Apple optimize the OS for its chips and vice versa? Yes. However, Apple Silicon hardware is just that good and that far ahead of x86.Here's an M4 Max running macOS running Parallels running Windows when compared to the fastest AMD laptop chip: https://browser.geekbench.com/v6/cpu/compare/13494385?baseli...
M4 Max is still faster even with 14 out of 16 possible cores being used. You can't chalk that up to optimizations anymore because Windows has no Apple Silicon optimizations.
Intel is busy fixing up their shit after what happened with their 13 & 14th gen CPU. Making imagine they making OS its called IntelOS and the only thing you can run is only by using Intel CPU
Or, contribute efficiency updates to popular open projects like firefox, chromium, etc...
Wouldn't it be easier for Intel to heavily modify the linus kernel instead of writing their own stack?
They could even go as far as writing the sleep utilities for laptops, or even their own window manager to take advantage of the specific mods in the ISA?
If it hadn't been killed, it may have become something interesting today.
it least on mobile platform apple advocate the other way with race to sleep - do calculation as fast as you can with powerful cores so that whole chip can go back to sleep earlier and more often take naps.
But when Apple says it, software devs actually listen.
The other aspect of it is that paid software is more prevalent in macOS land, and the prices are generally higher than on Windows. But the flip side of that is that user feedback is taken more seriously.
Like, would I prefer an older-style Macbook overall, with an integrated card reader, HDMI port, ethernet jack, all that? Yeah, sure. But to get that now I have to go to a PC laptop and there's so many compromises there. The battery life isn't even in the same zip code as a Mac, they're much heavier, the chips run hot even just doing web browsing let alone any actual work, and they CREAK. Like my god I don't remember the last time I had a Windows laptop open and it wasn't making all manner of creaks and groans and squeaks.
The last one would be solved I guess if you went for something super high end, or at least I hope it would be, but I dunno if I'm dropping $3k+ either way I'd just as soon stay with the Macbook.
Modern MacBook pros have 2/3 (card reader and HDMI port), and they brought back my beloved MagSafe charging.
If you fully load the CPU and calculate how much energy a AI340 needs to perform a fixed workload and compare that to a M1 you'll probably find similar results, but that only matters for your battery life if you're doing things like blender renders, big compiles or gaming.
Take for example this battery life gaming benchmark for an M1 Air: https://www.youtube.com/watch?v=jYSMfRKsmOU. 2.5 hours is about what you'd expect from an x86 laptop, possibly even worse than the fw13 you're comparing here. But turn down the settings so that the M1 CPU and GPU are mostly idle, and bam you get 10+ hours.
Another example would be a ~5 year old mobile qualcomm chip. It's a worse process node than an AMD AI340, much much slower and significantly worse performance per watt, and yet it barely gets hot and sips power.
All that to say: M1 is pretty fast, but the reason the battery life is better has to do with everything other than the CPU cores. That's what AMD and Intel are missing.
> If I open too many tabs in Chrome I can feel the bottom of the laptop getting hot, open a YouTube video and the fans will often spin up.
It's a fairly common issue on Linux to be missing hardware acceleration, especially for video decoding. I've had to enable gpu video decoding on my fw16 and haven't noticed the fans on youtube.