Preferences

hackinthebochs
Joined 5,774 karma

  1. Notice that the Rule 110 string picks out a machine, it is not itself the machine. To get computation out of it, you have to actually do computational work, i.e. compare current state, perform operations to generate subsequent state. This doesn't just automatically happen in some non-physical realm once the string is put to paper.
  2. No, that's insane. Computing is a dynamic process. A static string is not a computer.
  3. LLMs are extrapolation machines. They have some amount of hardcoded knowledge, and they weave a narrative around this knowledgebase while extrapolating claims that are likely given the memorized training data. This extrapolation can be in the form of logical entailment, high probability guesses or just wild guessing. The training regime doesn't distinguish between different kinds of prediction so it never learns to heavily weigh logical entailment and suppress wild guessing. It turns out that much of the text we produce is highly amenable to extrapolation so LLMs learn to be highly effective at bullshitting.
  4. LLMs are a general purpose computing paradigm. LLMs are circuit builders, the converged parameters define pathways through the architecture that pick out specific programs. Or as Karpathy puts it, LLMs are a differentiable computer[1]. Training LLMs discovers programs that well reproduce the input sequence. Roughly the same architecture can generate passable images, music, or even video.

    The sequence of matrix multiplications are the high level constraint on the space of programs discoverable. But the specific parameters discovered are what determines the specifics of information flow through the network and hence what program is defined. The complexity of the trained network is emergent, meaning the internal complexity far surpasses that of the course-grained description of the high level matmul sequences. LLMs are not just matmuls and logits.

    [1] https://x.com/karpathy/status/1582807367988654081

  5. This brain-receiver idea just isn't a very good theory. For one it increases the complexity of the model without any corresponding increase in explanatory power. The mystery of consciousness remains, except now you have all this extra mechanism involved.

    Another issue is that the brain is overly complex for consciousness to just be received from elsewhere. Typically a radio is much less complex than the signal being received, or at least less complex than the potential space of signals it is possible to receive. We don't see that with consciousness. In fact, consciousness seems to be far less complex than the brain that supports it. The issue of the specificity of brain damage and the corresponding specificity in conscious deficits also points away from the receiver idea.

  6. Yes, causal scope isn't what makes it virtual. It's what makes us say it's not real. The real/virtual dichotomy is what I'm attacking. We treat virtual as the opposite of real, therefore a virtual consciousness is not real consciousness. But this inference is specious. We mistake the causal scope issue for the issue of realness. We say the virtual candle isn't real because it can't burn our hand. What I'm saying is that, actually the virtual candle can't burn our hand because of the disjoint causal scope. But the causal scope doesn't determine what is real, it just determines the space and limitations of potential causal interactions.

    Real is about an object having all of the essential properties for that concept. If we take it as essential that candles can burn our hand, then the virtual candle isn't real. But it is not essential to consciousness that it is not virtual.

  7. That's a fair reading but not what I was going for. I'm trying to argue for the irrelevance of causal scope when it comes to determining realness for consciousness. We are right to privilege non-virtual existence when it comes to things whose essential nature is to interact with our physical selves. But since no other consciousness directly physically interacts with ours, it being "real" (as in physically grounded in a compatible causal scope) is not an essential part of its existence.

    Determining what is real by judging causal scope is generally successful but it misleads in the case of consciousness.

  8. Computation doesn't care about its substrate. A simulation of a computation is just a computation.
  9. >If we don't think the candle in a simulated universe is a "real candle", why do we consider the intelligence in a simulated universe possibly "real intelligence"?

    I can smell a "real" candle, a "real" candle can burn my hand. The term real here is just picking out a conceptual schema where its objects can feature as relata of the same laws, like a causal compatibility class defined by a shared causal scope. But this isn't unique to the question of real vs simulated. There are causal scopes all over the place. Subatomic particles are a scope. I, as a particular collection of atoms, am not causally compatible with individual electrons and neutrons. Different conceptual levels have their own causal scopes and their own laws (derivative of more fundamental laws) that determine how these aggregates behave. Real (as distinct from simulated) just identifies causal scopes that are derivative of our privileged scope.

    Consciousness is not like the candle because everyone's consciousness is its own unique causal scope. There are psychological laws that determine how we process and respond to information. But each of our minds are causally isolated from one another. We can only know of each other's consciousness by judging behavior. There's nothing privileged about a biological substrate when it comes to determining "real" consciousness.

  10. Your view is missing the forest for the trees. You see individual objects but miss the aggregate whole. You have a hard time conceiving of how exotic computers can be conscious because we are scale chauvinists by design. Our minds engage with the world on certain time and length scales, and so we naturally conceptualize our world based on entities that exist on those scales. But computing is necessarily scale independent. It doesn't matter to the computation if it is running on some 100GHz substrate or .0001Hz. It doesn't matter if its running on a CPU chip the size of a quarter or spread out over the entire planet. Computation is about how information is transformed in semantically meaningful ways. Scale just doesn't matter.

    If you were a mind supervening on the behavior of some massive time/space scale computer, how would you know? How could you tell the difference between running on a human making marks with pen and paper and running on a modern CPU? Your experience updates based on information transformations, not based on how fast the fundamental substrate is changing. When your conscious experience changes, that means your current state is substantially different from your prior state and you can recognize this difference. Our human-scale chauvinism gets in the way of properly imagining this. A mind running on a CPU or a large collection of human computers is equally plausible.

    A common question people like to ask is "where is the consciousness" in such a system. This is an important question if only because it highlights the futility of such questions. Where is Microsoft Word when it is running on my computer? How can you draw a boundary around a computation when there are a multitude of essential and non-essential parts of the system that work together to construct the relevant causal dynamic. It's just not a well-defined question. There is no one place where Microsoft Word occurs nor is there any one place where consciousness occurs in a system. Is state being properly recorded and correctly leveraged to compute the next state? The consciousness is in this process.

  11. The point is that the word grooming doesn't say enough to determine whether something is harmful. You just have to do the work to defend your claim to harm. But the grooming dynamic will always be inherently suspect when it involves other people's kids.
  12. Not saying they should be banned. Not all grooming is bad actually. But that is the purpose of ideological organizations to a large degree.
  13. Yes, an active targeted process. No, it doesn't have to be aimed at "someone". It can be aimed at creating an environment conducive to one's interested in some class of people.

    Yes, intentionally targeting kids with an ideology is grooming. It is preparing them to be amenable to your ideology to increase acceptance of it in the broader culture. At least that's the most innocuous reading of it.

  14. Unless you think he fabricated the pictures, I'm not sure what relevance the trustworthiness of the messenger has in this instance.
  15. Now do anorexia, bulimia, or any number of social contagions. The difference between being allowed to be who you are vs. being encouraged into a lifestyle is not easy to distinguish.
  16. >Most of the rest of the world subsidizes student tuition so students dont pay much out of pocket.

    And they also severely restrict who can attend university. Of course this is a non-starter in the current US political environment.

  17. What do you think these in principle limitations are that preclude a computer running the right program from reaching general intelligence?
  18. >Yes, and most with a background in linguistics or computer science have been saying the same since the inception of their disciplines

    I'm not sure what authority linguists are supposed to have here. They have gotten approximately nowhere in the last 50 years. "Every time I fire a linguist, the performance of the speech recognizer goes up".

    >Grammars are sets of rules on symbols and any form of encoding is very restrictive

    But these rules can be arbitrarily complex. Hand-coded rules have a pretty severe complexity bounds. But LLMs show these are not in principle limitations. I'm not saying theory has nothing to add, but perhaps we should consider the track record when placing our bets.

  19. Just when the "brain doesn't finish developing until 25" nonsense has finally waned from the zeitgeist, here comes a new pile of rubbish for people to latch onto. Not that the research itself is rubbish, but how they name/describe the phases certainly is. The "adolescent" and "adult" phases don't have any correspondence to what we normally think of as those developmental periods. That certainly wont stop anyone from using this as justification for whatever normative claim they want to make though. It's just irresponsible.
  20. LLMs aren't language models, but are a general purpose computing paradigm. LLMs are circuit builders, the converged parameters define pathways through the architecture that pick out specific programs. Or as Karpathy puts it, LLMs are a differentiable computer[1]. Training LLMs discovers programs that well reproduce the input sequence. Roughly the same architecture can generate passable images, music, or even video.

    It's not that language generation is all there is to AGI, but that to sufficiently model text that is about the wide range of human experiences, we need to model those experiences. LLMs model the world to varying degrees, and perhaps in the limit of unbounded training data, they can model the human's perspective in it as well.

    [1] https://x.com/karpathy/status/1582807367988654081

  21. >LLM tech will never lead to AGI. You need a tech that mimics synapses. It doesn’t exist.

    Why would you think synapses (or their dynamics) are required for AGI rather than being incidental owing to the constraints of biology?

    (This discussion never goes anywhere productive but I can't help myself from asking)

  22. They are deterministic in the sense that the inference process scores every word in the vocabulary in a deterministic manner. This score map is then sampled from according to the temperature setting. Non-determinism is artificially injected for ergonomic reasons.

    >But I think there’s still the question if this process is more similar to thought or a Markov chain.

    It's definitely far from a Markov chain. Markov chains treat the past context as a single unit, an N-tuple that has no internal structure. The next state is indexed by this tuple. LLMs leverage the internal structure of the context which allows a large class of generalization that Markov chains necessarily miss.

This user hasn’t submitted anything.

Keyboard Shortcuts

Story Lists

j
Next story
k
Previous story
Shift+j
Last story
Shift+k
First story
o Enter
Go to story URL
c
Go to comments
u
Go to author

Navigation

Shift+t
Go to top stories
Shift+n
Go to new stories
Shift+b
Go to best stories
Shift+a
Go to Ask HN
Shift+s
Go to Show HN

Miscellaneous

?
Show this modal