Preferences

This is meant to be some kind of Chinese room argument? Surely a 1e18 context window model running at 1e6 tokens per second could be AGI.

chmod775
Personally I'm hoping for advancements that will eventually allow us to build vehicles capable of reaching the moon, but do keep me posted on those tree growing endeavors.
mgraczyk OP
Tree growing?

And I don't follow, we've had vehicles capable of reaching the moon for over 55 years

anonymoushn
It's about the immutability of the network at runtime. But I really don't think this is a big deal. General-purpose computers are immutable after they are manufactured, but can exhibit a variety of useful behaviors when supplied with different data. Human intelligence also doesn't rely on designing and manufacturing revised layouts for the nervous system (within a single human's lifetime, for use by that single human) to adapt to different settings. Is the level of mutability used by humans substantially more expressive than the limits of in-context learning? what about the limits of more unusual in-context learning techniques that are register-like, or that perform steps of gradient descent during inference? I don't know of a good argument that all of these techniques used in ML are fundamentally not expressive enough.
mgraczyk OP
LLMs, considered as a function of input and output, are not immutable at runtime. They create tokens that change the function when it is called again. That breaks most theoretical arguments
VonGallifrey
Excuse me for the bad joke, but it seems like your context window was too small.

The Tree growing comment was a reference to another comment earlier in the comment chain.

mgraczyk OP
It's not a tree though
This argument works better for state space models. A transformer would still steps context one token at a time, not maintain an internal 1e18 state.
mgraczyk OP
That doesn't matter, are you familiar with any theoretical results in which the computation is somehow limited in ways that practically matter when the context length is very long? I am not
"Surely a 1e18 context window model running at 1e6 tokens per second could be AGI."

And why?

mgraczyk OP
Because that's quite a bit more information processing than any human brain
I don't think it is quantity that matters. Otherwise supercomputers are smart by definition.
mgraczyk OP
Well no, that's not what anyone is saying.

The claim was that it isn't possible in principle for "DAGs" or "immutable architectures" to be intelligent. That statement is confusing some theoretical results that aren't applicable to how LLMs work (output context is mutation).

I'm not claiming that compute makes the m intelligent. I'm pointing out that it is certainly possible, and at that level of compute it should be plausible. Feel free to share any theoretical results you think demonstrate the impossibility of "DAG" intelligence and are applicable

This item has no comments currently.