- If you have many metrics that could possibly be construed as "this was what we were trying to improve", that's many different possibilities for random variation to give you a false positive. If you're explicit at the start of an experiment that you're considering only a single metric a success, it turns any other results you get into "hmm, this is an interesting pattern that merits further exploration" and not "this is a significant result that confirms whatever I thought at the beginning."
It's basically a variation on the multiple comparisons, but sneakier: it's easy to spend an hour going through data and, over that time, test dozens of different hypotheses. At that point, whatever p-value you'd compute for a single comparison isn't relevant, because after that many comparisons you'd expect at least one to have uncorrected p = 0.05 by random chance.
- Advantage, sure. I just don't think that advantage is particularly meaningful in situations a human has virtually no chance of escaping. Humans also have a lot of their own advantages. How is a chatbot supposed to cross an air gap unless you assume it has what I consider unrealistic levels of persuasion?
I think you also have to consider that AI with superpowers is not going to materialize overnight. If superintelligent AI is on the horizon, the first such AI will be comparable to very capable humans (who do not have the ability to talk their way into nuclear launch codes or out of decades-long prison sentences at will). Energy costs will still be tremendous, and just keeping the system going will require enormous levels of human cooperation. The world will change a lot in that kind of scenario, and I don't know how reasonable it is to claim anything more than the observation of potential risks in a world so different from the one we know.
Is it possible that search ends up doing as much for persuasion as it does for chess, superintelligent AI happens relatively soon, and it doesn't have prohibitive energy costs such that escape is a realistic scenario? I suppose? Is any of that obvious or even likely? I wouldn't say so.
- If someone who is so good at manipulation their life is adapted into a movie still ends up serving decades behind bars, isn't that actually a pretty good indication that maxing out Speech doesn't give you superpowers?
AI that's as good as a persuasive human at persuasion is clearly impactful, but I certainly don't see it as self-evident that you can just keep drawing the line out until you end up with 200 IQ AI that is so easily able to manipulate the environment it's not worth elaborating how exactly a chatbot is supposed to manipulate the world through extremely limited interfaces with the outside world.
- I don't think there's a confident upper bound. I just don't see why it's self-evident that the upper bound is beyond anything we've ever seen in human history.
- People are hurt by animals all the time: do you think having a higher IQ than a grizzly bear means you have nothing to fear from one?
I certainly think it's possible to imagine that an AI that says the exactly correct thing in any situation would be much more persuasive than any human. (Is that actually possible given the limitations of hardware and information? Probably not, but it's at least not on its face impossible.) Where I think most of these arguments break down is the automatic "superintelligence = superpowers" analogy.
For every genius who became a world-famous scientist, there are ten who died in poverty or war. Intelligence doesn't correlate with the ability to actually impact our world as strongly as people would like to think, so I don't think it's reasonable to extrapolate that outwards to a kind of intelligence we've never seen before.
- Why is 2) "self-evident"? Do you think it's a given that, in any situation, there's something you could say that would manipulate humans to get what you want? If you were smart enough, do you think you could talk your way out of prison?
- Thanks for sharing this proof! As someone who enjoys math but never got myself through enough Galois theory to finish the standard proof, it's fantastic to see a proof that's more elementary while still giving a sense of why the group structure is important.
- At that point, you'd be better off just using a recursive algorithm like the one in GMP. You're swapping out arbitrary-length for arbitrary-precision.
- The compound-interest intro to e (the value of 1 dollar compounded continuously for a year at 100% interest), to me, has several useful advantages over different introductions that are more mathematically rich:
- It's elementary to the point that you can introduce it whenever you want.
- It automatically gives a sense of scale: larger than 2, but not by a lot.
- At least to me, it confers some sense of importance. You can get the sense that this number e has some deep connection to infinity and infinitesimal change and deserves further study even if you haven't seen calculus before.
- It directly suggests a way of calculating e, which "the base of the exponential function with derivative equal to itself" doesn't suggest as cleanly.
I don't know of any calculus course that relies on this definition for much: that's not its purpose. The goal is just to give students a fairly natural introduction to the constant before you show that e^x and ln x have their own unique properties that will be more useful for further manipulation.
- I will die on the hill that TOML should be used for the vast majority of what YAML's used for today. There are times a full language is needed, but I've seen so many YAML files that use none of the features YAML has with all of the footguns.
- They then say there's an approximation for Fibonacci, which makes me think that's what they're calling Binet's formula. (I'd also expect an author with this mathematical sophistication to be aware of Binet's formula, but maybe I'm projecting.)
- I don't think it's controversial to say that asymptotic analysis has flaws: the conclusions you draw from it only hold in the limit of larger inputs, and sometims "larger" means "larger than anything you'd be able to run it on." Perhaps as Moore's law dies we'll be increasingly able to talk more about specific problem sizes in a way that won't become obsolete immediately.
I suppose my question is why you think TCS people would do this analysis and development better than non-TCS people. Once you leave the warm cocoon of big-O, the actual practical value of an algorithm depends hugely on specific hardware details. Similarly, once you stop dealing with worst-case or naive average-case complexity, you have to try and define a data distribution relevant for specific real-world problems. My (relatively uninformed) sense is that the skill set required to, say, implement transformer attention customizing to the specific hierarchical memory layout of NVIDIA datacenter GPUs, or evaluate evolutionary optimization algorithms on a specific real-world problem domain, isn't necessarily something you gain in TCS itself.
When you can connect theory to the real world, it's fantastic, but my sense is that such connections are often desired and rarely found. At the very least, I'd expect that to often be a response to applied CS and not coming first from TCS: it's observed empirically that the simplex algorithm works well in practice, and then that encourages people to revisit the asymptotic analysis and refine it. I'd worry that TCS work trying to project onto applications from the blackboard would lead to less rigorous presentations and a lot of work that's only good on paper.
- Very cool!
What's meant by "it’s already too much to ask for a closed form for fibonacci numbers"? Binet's formula is usually called a closed form in my experience. Is "closed form" here supposed to mean "closed form we can evaluate without needing arbitrary-precision arithmetic"?
- It seems to me that much of recent AI progress has not changed the fundamental scaling principles underlying the tech. Reasoning models are more effective, but at the cost of more computation: it's more for more, not more for less. The logarithmic relationship between model resources and model quality (as Altman himself has characterized it), phrased a different way, means that you need exponentially more energy and resources for each marginal increase in capabilities. GPT-4.5 is unimpressive in comparison to GPT-4, and at least from the outside it seems like it cost an awful lot of money. Maybe GPT-5 is slightly less unimpressive and significantly more expensive: is that the through-line that will lead to the singularity?
Compare the automobile. Automobiles today are a lot nicer than they were 50 years ago, and a lot more efficient. Does that mean cars that never need fuel or recharging are coming soon, just because the trend has been higher efficiency? No, because the fundamental physical realities of drag still limit efficiency. Moreover, it turns out that making 100% efficient engines with 100% efficient regenerative brakes is really hard, and "just throw more research at it" isn't a silver bullet. That's not "there won't be many future improvements", but it is "those future improvements probably won't be any bigger than the jump from GPT-3 to o1, which does not extrapolate to what OP claims their models will do in 2027."
AI in 2027 might be the metaphorical brand-new Lexus to today's beat-up Kia. That doesn't mean it will drive ten times faster, or take ten times less fuel. Even if high-end cars can be significantly more efficient than what average people drive, that doesn't mean the extra expense is actually worth it.
- It's not that technical work is guaranteed to be in your codebase 10 years from now, it's that customers don't want to use a product that might be good six months from now. The actors in the best position to use new AI advances are the ones with good brands, customer bases, engineering know-how that does transfer, etc.
- Previous experience isn't manual edge cases, it's training data. Humans have incredible scale (100 trillion synapses): we're incredibly good at generalizing, e.g., how to pick up objects we've never seen before or understanding new social situations.
If you want to learn how to play chess, understanding the basic principles of the game is far more effective than trying to memorize every time you make an opening mistake. You surely need some amount of rote knowledge, but learning how to appraise new chess positions scales much, much better than trying to learn an astronomically small fraction of chess positions by heart.
- The time span on which these developments take place matter a lot for whether the bitter lesson is relevant to a particular AI deployment. The best AI models of the future will not have 100K lines of hand-coded edge cases, and developing those to make the models of today better won't be a long-term way to move towards better AI.
On the other hand, most companies don't have unlimited time to wait for improvements on the core AI side of things, and even so building competitive advantages like a large existing customer base or really good private data sets to train next-gen AI tools have huge long-term benefits.
There's been an extraordinary amount of labor hours put into developing games that could run, through whatever tricks were necessary, on whatever hardware actually existed for consumers at the time the developers were working. Many of those tricks are no longer necessary, and clearly the way to high-definition real-time graphics was not in stacking 20 years of tricks onto 2000-era hardware. I don't think anyone working on that stuff actually thought that was going to happen, though. Many of the companies dominating the gaming industry now are the ones that built up brands and customers and experience in all of the other aspects of the industry, making sure that when better underlying scaling came there they had the experience, revenue, and know-how to make use of that tooling more effectively.
- > The models keep getting better at an exponential.
Isn't it the opposite? Marginal improvements require exponentially more investment, if we believe Altman. AI is expanding into different areas, and lots of improvements have been made in less saturated fields, but performance on older benchmarks has plateaued, especially relative to compute costs.
Even if you focus on areas where growth is rapid, the history of technology shows many, many examples of rapid growth hitting different bottlenecks and stopping. Futurists have predicted common flying cars for decades and decades, but it'll be a long, long time before helicopters are how people commute to work. There are fundamental physical limitations to the concept that technological advancement does not trivialize.
Maybe the problems facing potential AGI have relatively straightforward technological solutions. Maybe, like neural networks already have shown, it will take decades of hardware advancements before advancements conceived of today can see practice. Maybe replicating human-level intelligence requires hardware much closer to the scale of the human brain than we're capable of making right now, with a hundred trillion individual synapses each more complex than any neuron in an artificial neural network.
- The math underpinning an AI model exists independent of the hardware it's realized on. I can train a model on one GPU and someone else can replicate my results with a different GPU running different drivers, down to small numerical differences that should hopefully not have major effects.
Data isn't fungible in the same way: I can't just replace one dataset with another for research where the data generation and curation is the primary novel contribution and expect to replicate the results.
There's also a larger accountability picture: just like scientific papers that don't publish data are inherently harder to check for statistical errors or outright fraud, there's a lot of uncomfortable trust required for open-weight closed-data models. How much contamination is there for the major AI benchmarks? How much copyrighted data was used? How can we be sure that the training process was conducted as the authors say, whether from malfeasance or simple mistakes?
- What incentives do any humans have to so totally delegate the functioning of the core levers of societal power that they're unable to prevent their own extinction?
"Better machine alternatives" implies that the police and military aren't first and foremost evaluated through their loyalty. A powerful army that doesn't listen to you is not a "better" one for your purposes. The same isn't true of the economy: one could argue that our current economic system is beyond any one person's ken, but even if I don't understand how my coffee came to me and no one person would be an expert on that entire pipeline it works.
The idea that AI could lead to power concentrating in the hands of a few oligarchs who use a robot army as a more effective version of the janissaries or praetorian guard of the past certainly seems broadly plausible, although I'm not sure that the effectiveness of the Stasi is the limiting factor on autocracy or oligarchy. I don't understand how that links to human extinction. For most of human history, most people have been unable to meaningfully impact the way their society operates. That is responsible for an incalculable amount of suffering, and it's not a threat to be taken lightly, but if anything one might argue it's likely to ensure some human survival for longer than a less stable, freer system.
- It's hard to characterize the entropy of the distribution of potential diseases given a presentation: even if there are in theory many potential diagnoses, in practice a few will be a lot more common.
It doesn't really matter how much better the model is than random chance on a sample size of 5, though. There's a reason medicine is so heavily licensed: people die when they get uninformed advice. Asking o1 if you have skin cancer is gambling with your life.
That's not to say AI can't be useful in medicine: everyone doesn't have a dermatologist friend, after all, and I'm sure for many underserved people basic advice is better than nothing. Tools could make the current medical system more efficient. But you would need to do so much more work than whatever this post did to ascertain whether that would do more good than harm. Can o1 properly direct people to a medical expert if there's a potentially urgent problem that can't be ruled out? Can it effectively disclaim its own advice when asked about something it doesn't know about, the way human doctors refer to specialists?
- There has been a huge increase in context windows recently.
I think the larger problem is "effective context" and training data.
Being technically able to use a large context window doesn't mean a model can actually remember or attend to that larger context well. In my experience, the kinds of synthetic "needle in haystack" tasks that AI companies use to show how large of a context their model can handle don't translate very well to more complicated use cases.
You can create data with large context for training by synthetically adding in random stuff, but there's not a ton of organic training data where something meaningfully depends on something 100,000 tokens back.
Also, even if it's not scaling exponentially, it's still scaling: at what point is RAG going to be more effective than just having a large context?
- > its scaling keeps going with no end in sight.
Not only are we within eyesight of the end, we're more or less there. o1 isn't just scaling up parameter count 10x again and making GPT-5, because that's not really an effective approach at this point in the exponential curve of parameter count and model performance.
I agree with the broader point: I'm not sure it isn't consistent with current neuroscience that our brains aren't doing anything more than predicting next inputs in a broadly similar way, and any categorical distinction between AI and human intelligence seems quite challenging.
I disagree that we can draw a line from scaling current transformer models to AGI, however. A model that is great for communicating with people in natural language may not be the best for deep reasoning, abstraction, unified creative visions over long-form generations, motor control, planning, etc. The history of computer science is littered with simple extrapolations from existing technology that completely missed the need for a paradigm shift.
- It's wonderfully idealistic to believe that the people who would pay $10 a gallon for gas are the people that needed it the most and not just the people with the most disposable income. I've known people who would happily pay $10 a gallon for gas they could have easily avoided using because it's pocket change to them, and similarly people who wouldn't buy it for essentially any reason.
Additionally so to believe that the government will hand out money so responsively in reaction to natural disasters that there's no reason to limit price gouging as a way of more effectively making sure poor people can buy food immediately after a hurricane. The government can't be trusted to set prices, but can be trusted to give everyone cash immediately after a hurricane knocks out the major road?
There are no good ways to allocate limited resources in the aftermath of natural disasters or similar acute supply shocks. Setting per-party quotas on buying toilet paper isn't perfectly fair, as the article points out, but to me it seems an awful lot more fair than "the rich get everything." As a limited, temporary amelioration of the kind of naked greed that the article admits most people find repugnant, in situations that are rare and have limited impact on the functioning of normal markets, it seems like common sense.
- Given how much the author talks about professional standards, one would think they would write professionally.
Also, is the author aware of why people use Anaconda? Conda environments can make it significantly easier to link CUDA or Fortran libraries properly, which are quite prevalent in scientific computing. Many people who use such code bases are professional scientists and not professional programmers, so I understand seeing that generally conda-based packages are poorly architected.
It's almost like other people sometimes do things differently because they have different needs or have thought of things you haven't, not just because they're too stupid or uneducated to know better.
- The point isn't that it's a secret, it's that it's a string that shouldn't appear in any context besides the benchmark. It's an easy way to identify contamination in datasets. People often don't explicitly repeat them because that leads to false positives.
AI can be used in ways that lead to deeper understanding. If a student wants AI to give them practice problems, or essay feedback, or a different explanation of something that they struggle with, all of those methods of learning should translate to actual knowledge that can be the foundation of future learning or work and can be evaluated without access to AI.
That actual knowledge is really important. Literacy and numeracy are not the same thing as mental arithmetic. Someone who can't read literature in their field (whether that's a Nature paper or a business proposal or a marketing tweet) shouldn't rely on AI to think for them, and certainly universities shouldn't be encouraging that and endorsing it through a degree.
I think the most important thing about that kind of deeper knowledge is that it's "frictional", as the original essay says. The highest-rated professors aren't necessarily the ones I've learned the most from, because deep learning is hard and exhausting. Students, by definition, don't know what's important and what isn't. If someone has done that intellectual labor and then finds AI works well enough, great. But that's a far cry from being reliant on AI output and incapable of understanding its limitations.