Comment by mgraczyk - Hacker Neue

mgraczyk 9 hours ago parent

This is meant to be some kind of Chinese room argument? Surely a 1e18 context window model running at 1e6 tokens per second could be AGI.

chmod775 8 hours ago

Personally I'm hoping for advancements that will eventually allow us to build vehicles capable of reaching the moon, but do keep me posted on those tree growing endeavors.

mgraczyk OP 6 hours ago

Tree growing?

And I don't follow, we've had vehicles capable of reaching the moon for over 55 years

anonymoushn 1 hour ago

It's about the immutability of the network at runtime. But I really don't think this is a big deal. General-purpose computers are immutable after they are manufactured, but can exhibit a variety of useful behaviors when supplied with different data. Human intelligence also doesn't rely on designing and manufacturing revised layouts for the nervous system (within a single human's lifetime, for use by that single human) to adapt to different settings. Is the level of mutability used by humans substantially more expressive than the limits of in-context learning? what about the limits of more unusual in-context learning techniques that are register-like, or that perform steps of gradient descent during inference? I don't know of a good argument that all of these techniques used in ML are fundamentally not expressive enough.

mgraczyk OP 49 minutes ago

LLMs, considered as a function of input and output, are not immutable at runtime. They create tokens that change the function when it is called again. That breaks most theoretical arguments

VonGallifrey 5 hours ago

Excuse me for the bad joke, but it seems like your context window was too small.

The Tree growing comment was a reference to another comment earlier in the comment chain.

mgraczyk OP 1 hour ago

It's not a tree though

rar00 5 hours ago

This argument works better for state space models. A transformer would still steps context one token at a time, not maintain an internal 1e18 state.

mgraczyk OP 1 hour ago

That doesn't matter, are you familiar with any theoretical results in which the computation is somehow limited in ways that practically matter when the context length is very long? I am not

lukan 8 hours ago

"Surely a 1e18 context window model running at 1e6 tokens per second could be AGI."

And why?

mgraczyk OP 1 hour ago

Because that's quite a bit more information processing than any human brain

lukan 55 minutes ago

I don't think it is quantity that matters. Otherwise supercomputers are smart by definition.

mgraczyk OP 51 minutes ago

Well no, that's not what anyone is saying.

The claim was that it isn't possible in principle for "DAGs" or "immutable architectures" to be intelligent. That statement is confusing some theoretical results that aren't applicable to how LLMs work (output context is mutation).

I'm not claiming that compute makes the m intelligent. I'm pointing out that it is certainly possible, and at that level of compute it should be plausible. Feel free to share any theoretical results you think demonstrate the impossibility of "DAG" intelligence and are applicable

This item has no comments currently.