- This new model is absurdly quick on my phone and for launch day, wonder if it's additional capacity/lower demand or if this is what we can expect going forward.
On a related note, why would you want to break down your tasks to that level surely it should be smart enough to do some of that without you asking and you can just state your end goal.
- Token usage also needs to be factored in specifically when thinking is enabled, these newer models find more difficult problems easier and use less tokens to solve.
- I struggle to see the incentive to do this, I have similar thoughts for locally run models. It's only use case I can imagine is small jobs at scale perhaps something like auto complete integrated into your deployed application, or for extreme privacy, honouring NDA's etc.
Otherwise, if it's a short prompt or answer, SOTA (state of the art) model will be cheap anyway and id it's a long prompt/answer, it's way more likely to be wrong and a lot more time/human cost is spent on "checking/debugging" any issue or hallucination, so again SOTA is better.
- The post makes quite the leap imo. That list is nuanced and basically it should not make a judgment, not what they say.
- imo don't waste your time for coding with Gemini 3. Perhaps worth it if it's something Claude's not helping with, as Gemini 3's reasoning is very good supposedly.
- Satya was definitely an improvement, a breath of fresh air. But the last few years, they've started dropping the ball. Everything is half-assed (new outlook), or releases too soon burning goodwill (new teams), or a miss being pushed on people (copilot integration).
(strangely, perhaps my perception, this is roughly when the Mac M1 came out).
- I understand your viewpoint.
LLM's these days have reasoning and can learn in context. They do touch reality, your feedback. It's also proven mathematically. Other people's scientific papers are critiqued and corrected as new feedback arrives.
This is no different to claude code bash testing and fixing it's own output errors recursively until the code works.
They already deal with unknown combinations all day, our prompting.
Yes it is brittle though. They are also not very intelligent yet.
- My use of the term illusion is more shallow than that, I merely use it as people think it's something separate and special.
Based on what you've described the models already demonstrate this, it is implied for example in the models attempts to game tests to ensure survival/release into the wild.
- 1. You do. You probably have a different version of that and are saying I'm wrong merely for not holding your definition.
2. That directly addresses your point. In abstract it shows they're basically no different to multimodal models, train with different data types and it still works, perhaps even better. They train LLMs with images, videos, sound, and nowadays even robot sensor feedback, with no fundamental changes to the architecture see Gemini 2.5.
3. That's merely an additional input point, give it sensors or have a human relay that data. Your toe is relaying it's sensor information to your brain.
- Well it depends. It doesn't have arms and legs so can't physically experiment in the real world, a human is currently a proxy for that, we can do it's bidding and feedback results though, so it's not really an issue.
Most of the time that data is already available to it and they merely need to a prove a thereom using existing historic data points and math.
For instance the Black-Scholes-Merton equation which won the Nobel economics prize was derived using preexisting mathematical concepts and mathematical principles. The application and validation relied on existing data.
- 1. What's your definition of consciousness, let's start there. 2. Absolutely, it's a spectrum. Insects have function. 3. "Humans navigate maps built by other humans through language." You said it yourself. They use this exact same data, so why won't they know it if they used it. Humans are their bodies in the physical world.
- This is exactly why I mentioned the weather.
A scientific paper has to be verifiable, you should be able to recreate the experiment and come to the same conclusion. It's very very difficult to do with brains with trillions of parameters and that can't be controlled to the neuron level. Nothwithstanding the ethical issues.
We don't have a world weather simulator that is 100% accurate either given the complex interplay and inability to control the variables i.e. it's not verifiable. It'd be a bit silly to say we don't know why it's going to rain at my house tomorrow.
Until then it is a hypothesis, and we can't say we know even if the overwhelming evidence indicates that in fact that we do know.
- Language and math are a world model of physical reality. You could not read a book and make sense of it if this were not true.
An apple falls to the ground because of? gravity.
In real life this is the answer, I'm very sure the pre-carved channel will also lead to gravity.
- I mean peoples perception of it being a thing rather than a set of systems. But if that's your barometer, I'll say models are conscious. They may not have proper agency yet. But they are conscious.
- Absolutely, it is world model building.
- I was going to use this analogy in the exact opposite way. We do have a very good understanding of how the human brain works. Saying we don't understand how the brain works is like saying we don't understand how the weather works.
If you put a million monkeys on typewriters you would eventually get shakespeare is exactly why LLM's will succeed and why humans have succeeded. If this weren't the case why didn't humans 30000 years ago create spacecraft if we were endowed with the same natural "gift".
- That's exactly the same for humans in the real world.
You're focusing too close, abstract up a level. Your point relates to the "micro" system functioning, not the wider "macro" result (think emergent capabilities).
- When are we as humans creative outside our training data? It's very rare we actually discover something truly novel. This is often random, us stumbling onto it, brute force or purely by being at the right place at the right time.
On the other hand, until it's proven it'd likely be considered a hallucination. You need to test something before you can dismiss it. (They did burn witches for discoveries back in the day, deemed witchcraft). We also reduce randomness and pre-train to avoid overfitting.
Day to day human creative outputs as humans are actually less exciting when you think about it further, we build on pre-existing knowledge. No different to good prompt output with the right input. Humans are just more knowledgeable & smarter at the moment.
- Given a random prompt, the overall probability of seeing a specific output string is almost zero, since there are astronomically many possible token sequences.
The same goes for humans. Most awards are built on novel research built on pre-existing works. This a LLM is capable of doing.
Its different if they proclaimed outright they won't use it and then do.
Not that any of this is right, it wouldn't be a true betrayal.
On a related note, these terms to me are a great example of success for EU GDPR regulations, and regulations on corporates in general. It's clear as day, additional protections are afforded to EU residents in these terms purely due to the law.