- > Did they actually need all variables to be stable at the point of any memory access?
One of the most important optimizations that a compiler can do is keeping a variable in a register and never even bother letting it hit memory in the first place. If every variable must get its own RAM address and the value at that RAM address must be faithful to a variable's "true" value at any given instruction, we should expect our software to slow down by an order of magnitude or two.
- > I wouldn't be surprised if their code is also vibe coded slop.
That's my takeaway from this too. I think they tried the first thing the LLM suggested, it didn't work, they asked the LLM to fix it, and ended up with this crap. They never tried to really understand the problems they were facing.
Video is really fiddly. You have all sorts of parameters to fiddle with. If you don't dig into that and figure out what tradeoffs you need to make, you'll easily end up in the position where checks notes you think you need 40Mbps for 1080p video and 10Mbps is just too shitty.
There's various points in the article where they talk about having 30 seconds of latency. Whatever's causing this, this is a solved problem. We all have experience dealing with video teleconferencing, this isn't anything new, it's nothing special, they're just doing it wrong. They say it doesn't work because of corporate network policy, but we all use Teams or Slack.
I think you're right. They just did a bunch of LLM slop and decided to just send it. At no point did they understand any of their problems any deeper than the LLM tried to understand the problem.
- > non-technical people
It's also better for the technical people. If you self host the DB goes down at 2am on a Sunday morning all the technical people are gonna get woken up and they will be working on it until it's fixed.
If us-east goes down a technical person will be woken up, they'll check downdetector.com, and they'll say "us-east is down, nothin' we can do" and go back to sleep.
- Trivially. Zip file headers specify where the data is. All other bytes are ignored.
That's how self extraction archives and installers work and are also valid zip files. The extractor part is just a regular executable that is a zip decompresser that decompresses itself.
This is specific to zip files, not the deflate algorithm.
- At work, our daily build (actually 4x per day) is a handful of zip files totaling some 7GB. The script to get the build would copy the archives over the network, then decompress then into your install directory.
This works great on campus, but when everyone went remote during COVID it wasn't anymore. It went from three minutes to like twenty minutes.
However. Most files change only rarely. I don't need all the files, just the ones which are different. So I wrote a scanner thing which compares the zip file's filesize and checksum to the checksum of the local file. If they're the same, we skip it, otherwise, we decompress out of the zip file. This cut the time to get the daily build from 20 minutes to 4 minutes.
Obviously this isn't resilient to an attacker, crc32 is not secure, but as an internal tool it's awesome.
- "Anyone, from the most clueless amateur to the best cryptographer, can create an algorithm that he himself can’t break."--Bruce Schneier
There's a corollary here with LLMs, but I'm not pithy enough to phrase it well. Anyone can create something using LLMs that they, themselves, aren't skilled enough to spot the LLMs' hallucinations. Or something.
LLMs are incredibly good at exploiting peoples' confirmation biases. If it "thinks" it knows what you believe/want, it will tell you what you believe/want. There does not exist a way to interface with LLMs that will not ultimately end in the LLM telling you exactly what you want to hear. Using an LLM in your process necessarily results in being told that you're right, even when you're wrong. Using an LLM necessarily results in it reinforcing all of your prior beliefs, regardless of whether those prior beliefs are correct. To an LLM, all hypotheses are true, it's just a matter of hallucinating enough evidence to satisfy the users' skepticism.
I do not believe there exists a way to safely use LLMs in scientific processes. Period. If my belief is true, and ChatGPT has told me it's true, then yes, AI, the tool, is the problem, not the human using the tool.
- I would recommend Fedora only hesitantly.
Fedora's release cycle is usually a little over a year from final release to EOL, at which point, you need to upgrade. My Mom and Dad ain't gonna wanna do that. For them, better to install the latest Ubuntu LTS once, and then I can upgrade for them at Christmas in 4 years.
Fedora is usually a bit...evangelical about open source software. If one of the things you really want is closed source, you'll have to take a few extra steps. Notably Nvidia drivers, but also stuff like Discord or Steam.
Fedora tends to move fast and break things. They tend to adopt things before they're good and ready. I believe Fedora was the first to switch to Wayland, and they did so before it was really ready, but I might be mistaken.
For a lot of users, #1 and #3 above are good things; they want the latest and greatest stuff, but don't want the occasional breakages that result from using a rolling release distro like Arch or Gentoo. For a lot of users, notably my Mom and Dad, they don't want to deal with shit like that, they just want to turn their computer on and forward funny pictures to me and their friends and do their word puzzles.
Fedora is a great distro, and it's the perfect distro for a lot of people, but some of its core philosophical principles make it a suboptimal distro for the less computer literate.
- They're called NPUs and all recent CPUs from Intel, AMD, or Apple have them. They're actually reasonably power efficient. All flagship smartphones have them, as well as several models down the line as well.
IIRC linux drivers are pretty far behind, because no one who works on linux stuff is particularly interested in running personal info like screenshots or mic captures through a model and uploading the telemetry. While in general I get annoyed when my drivers suck, in this particular case I don't care.
- In Southern California it costs $120 just for a guy to come out and look at your HVAC. Not fix anything--not install anything--just to look at it and give you an estimate for how much the repair is going to take. I went to the website for a local installer and they give a ballpark of $13,000-$25,000 for a heat pump installation.
I don't know why it's so expensive here. It shouldn't be, it makes no sense. But it is.
- One of the demos was printing a thing out, but the processor was hopelessly too slow to perform the actual print job. So they hand unrolled all the code to get it down from something like a 30 minute print job to a 30 second print job.
I think at this point it should be expected that every publicly facing demo (and most internal ones) are staged.
- I've thankfully never had my house robbed, or a cell phone or laptop stolen. I have had my car broken into. The thieves chucked a paving stone through the window, grabbed a backpack sitting on the passenger's seat, and ran off with it. Left the paving stone in the driver's seat. The backpack had my gym clothes in it. A T-shirt I was rather fond of, a pair of shorts, a few extra pairs of socks, and a shitty pair of sneakers, all were well worn.
Replacing the backpack and gym clothes was probably $100, market value was maybe $10, and it was $507 to fix the window. (my deductible was $500.)
- > With a copyright, people are allowed to do anything similar to you, so long as they do not derive their work from yours.
John C. Fogerty famously got sued by John C. Fogerty for sounding too similar to John C. Fogerty.
- > The universe [...] doesn't require infinite precision,
Doesn't it though?
What happens when three bodies in a gravitationally bound system orbit each other? Our computers can't precisely compute their interaction because our computers have limited precision and discrete timesteps. Even when we discard such complicated things as relativity, what with its Lorentz factors and whatnot.
Nature can perfectly compute their interactions because it has smooth time and infinite precision.
- 1/sqrt(x) is complicated. Imagine instead of computing 1/sqrt(x), imagine instead that you wanted to compute exp_2(-.5 log_2(x)). Also imagine you have an ultra fast way to compute exp_2 and log_2. If you have an ultra fast way to compute exp_2 and log_2, then exp_2(-.5 log_2(x)) is gonna be fast to compute.
It turns out you do have an ultra fast way to compute log_2: you bitcast a float to an integer, and then twiddle some bits. The first 8 bits (after the sign bit, which is obviously zero because we're assuming our input is positive) or whatever are the exponent, and the trailing 23 bits are a linear interpolation between 2^n and 2^(n+1) or whatever. exp_2 is the same but in reverse.
You can simply convert the integer to floating point, multiply by -.5, then convert back to integer. But multiplying -.5 by x can be applied to a floating point operating directly on its bits, but it's more complicated. You'll need to do some arithmetic, and some magic numbers.
So you're bitcasting to an integer, twiddling some bits, twiddling some bits, twiddling some bits, twiddling some bits, and bitcasting to a float. It turns out that all the bit twiddling simplifies if you do all the legwork, but that's beyond the scope of this post.
So there you go. You've computed exp_2(-.5 log_2 x). You're done. Now you need to figure out how to apply that knowledge to the inverse square root.
It just so happens that 1/sqrt(x) and exp(-.5 log x) are the same function. exp(-.5 log x) = exp(log(x^-.5)) = x^-.5 = 1/sqrt(x).
Any function where the hard parts are computing log_2 or exp_2 can be accelerated this way. For instance, x^y is just exp_2(y log_2 x).
Note that in fast inverse square root, you're not doing Newton's method on the integer part, you're doing it on the floating point part. Newton's method doesn't need to be done at all, it just makes the final result more accurate.
Here's a blog here that gets into the nitty gritty of how and why it works, and a formula to compute the magic numbers: https://h14s.p5r.org/2012/09/0x5f3759df.html
- Accept-reject methods are nonstarters when the architecture makes branching excessively expensive, specifically SIMD and GPU, which is one of the domains where generating random points on a sphere is particularly useful.
The Box-Muller transform is slow because it requires log, sqrt, sin, and cos. Depending on your needs, you can approximate all of these.
log2 can be easily approximated using fast inverse square root tricks:
(conveniently, this also negates the need to ensure your input is not zero)constexpr float fast_approx_log2(float x) { x = std::bit_cast<int, float>(x); constexpr float a = 1.0f / (1 << 23); x *= a; x -= 127.0f; return x; }sqrt is pretty fast; turn `-ffast-math` on. (this is already the default on GPUs) (remember that you're normalizing the resultant vector, so add this to the mag_sqr before square rooting it)
The slow part of sin/cos is precise range reduction. We don't need that. The input to sin/cos Box-Muller is by construction in the range [0,2pi]. Range reduction is a no-op.
For my particular niche, these approximations and the resulting biases are justified. YMMV. When I last looked at it, the fast log2 gave a bunch of linearities where you wanted it to be smooth, however across multiple dimensions these linearities seemed to cancel out.
- Another datapoint that supports your argument is the Grand Theft Auto Online (GTAO) thing a few months ago.[0] GTAO took 5-15 minutes to start up. Like you click the icon and 5-15 minutes later you're in the main menu. Everyone was complaining about it for years. Years. Eventually some enterprising hacker disassembled the binary and profiled it. 95% of the runtime was in `strlen()` calls. Not only was that where all the time was spent, but it was all spent `strlen()`ing the exact same ~10MB resource string. They knew exactly how large the string was because they allocated memory for it, and then read the file off the disk into that memory. Then they were tokenizing it in a loop. But their tokenization routine didn't track how big the string was, or where the end of it was, so for each token it popped off the beginning, it had to `strlen()` the entire resource file.
The enterprising hacker then wrote a simple binary patch that reduced the startup time from 5-10 minutes to like 15 seconds or something.
To me that's profound. It implies that not only was management not concerned about the start up time, but none of the developers of the project ever used a profiler. You could just glance at a flamegraph of it, see that it was a single enormous plateau of a function that should honestly be pretty fast, and anyone with an ounce of curiousity would be like, ".........wait a minute, that's weird." And then the bug would be fixed in less time than it would take to convince management that it was worth prioritizing.
It disturbs me to think that this is the kind of world we live in. Where people lack such basic curiosity. The problem wasn't that optimization was hard, (optimization can be extremely hard) it was just because nobody gave a shit and nobody was even remotely curious about bad performance. They just accepted bad performance as if that's just the way the world is.
[0] Oh god it was 4 years ago: https://nee.lv/2021/02/28/How-I-cut-GTA-Online-loading-times...
- When you look at, for instance, a bowl, or even one of those egg carton mattress things, and you want to find the global minimum, you are looking at a surface which is 2 dimensions in and 1 dimension out. It's easy enough for your brain to process several thousand points and say ok the bottom of the bowl is right here.
When a computer has a surface which is 2 dimensions in and 1 dimension out, you can actually just do the same thing. Check like 100 values in the x/y directions and you only have to check like 10000 values. A computer can do that easy peasy.
When a computer does ML with a deep neural network, you don't have 2 dimensions in and 1 dimension out. You have thousands to millions of dimensions in and thousands to millions of dimensions out. If you have 100000 inputs, and you check 1000 values for each input, the total number of combinations is 1000^100000. Then remember that you also have 100000 outputs. You ain't doin' that much math. You ain't.
So we need fancy stuff like Jacobians and backtracking.
- > 100% agree, but what's frustrating is that "the left" are not much better. We get things like the rewriting[1] of Roald Dahl's books based on the feedback from "sensitivity readers".
You realize that's significantly better, right? Like at least two orders of magnitude better?
In one case, the copyright holder of Roald Dahl's books decided to censor (incorrectly, I agree) their own books which they own the copyright to. That's a private organization doing a stupid thing, making their own content worse. A private organization censoring their own words. No elected officials or persons appointed by elected officials were involved.
In the other case, the government is unilaterally deciding to withhold information from the public. The government is censoring other people's words.
You realize how that's not even remotely similar, right? "The left" is way better on this one.
- In 2020, China had 253GW of total solar capacity. In 2024, China installed 277GW of additional solar capacity. In June, they were up to 1,100GW of total solar capacity. In 2023, 60% of the global solar installation happened in China.
China is so far ahead of the world in green energy it's ridiculous. We can't pretend that it's still 1995 and China still gets most of its energy from coal anymore.
- It would be a hard sell to finger Steam as being at high risk of fraud. Steam has a very generous refund policy, and if you don't consider it generous enough, and chargeback a purchase on your Steam account, they just lock it (and access to all your games) until you pay them.
I don't have insider information about how often Steam gets hit with fraud alleged chargebacks, but I can't imagine it's a significant percentage.
> Don’t get me wrong: it’s progress. But it’s far from a panacea.
Progress in this day and age is great. Progress right now is at least 2 orders of magnitude better than patiently waiting for a panacea.
- Yeah. There've been a laundry list of innovations over the years that people will invent, show how it improves how a scene looks, and then for the next few years everyone turns it up to 11 and it looks like shit. Bloom, SSAO, lens flare, film grain, vignetting, DoF.
After a while people turn it back down to like a 4 and it improves things.
- No.
When you see "artist's impression" in a news article about space, what you're looking at is a painting or drawing created from whole cloth by an artist.
This article is about how sensors turned signals into images. When you take pictures with a 'normal' camera, we've designed them so that if you take certain steps, the image on your screen looks the same as what it would look like in real life with no camera or monitor. This article is stating that with the cameras and filters they use for telescopes, that same process doesn't really work. We use special filters to measure specific spectral properties about an astronomical object. This gives good scientific information, however, it means that in many cases it's impossible to reconstruct what an astronomical object would really look like if our eyes were more sensitive and we looked at it.
Like fine, they're gonna make a distro that only uses software under one of the FSF's free as in freedom copy-left open source licenses, not just excluding closed source software, but also binary blob device firmware and software distributed under one of those filthy permissive licenses. That's great. It's fucking unusable, but it's awesome that it exists and it's great that they're doing it.