Preferences

13years
Joined 1,574 karma
Software Engineer 30+ years. Fortune 100 companies.

Writes at https://www.mindprison.cc


  1. Thanks, interesting reference. However, their analysis doesn't tell us much about the quality of Grokipedia. Would be more interested in something like hallucination density, but I know of no way that could be measured.
  2. > We suck at measuring ourselves.

    That is a certainty. I was once asked to calculate how much time we would save through our companies code reuse program. I read all the material on estimating savings, but then proved it was all ridiculous.

    I came across a study that attempted to estimate how long it took to build libraries that had already been built. In this case, there were no unknowns, you had the entire code. Estimates were off by orders of magnitude. If we can't estimate the work when the work is already done, how could we ever estimate the work when we know less?

  3. Not sure how you get around the contamination problems. I use these everyday and they are extremely problematic about making errors that are hard to perceive.

    They are not reliable tools for any tasks that require accurate data.

  4. A philosophical lens can sometimes help us perceive the root drivers of a set of problems. I sometimes call AI humanity's great hubris experiment.

    AI's disproportionate capability to influence and capture attention versus productive output is a significant part of so many negative outcomes.

  5. Yes, that's an excellent description.
  6. I think it is creating a growing interest in authenticity among some. Although, it still feels like this is a minority opinion. Every content platform is being flooded with AI content. Social media floods it into all of my feeds.

    I wish I could push a button and filter it all out. But that's the problem we have created. It is nearly impossible to do. If you want to consume truly human authentic content, it is nearly impossible to know. Everyone I interact with now might just be a bot.

  7. > Myself I believe technology and eventually AI were our fate once we became intelligence optimizers.

    Yes, everyone talks about the Singularity, but I see the instrumental point of concern to be something prior which I've called the Event Horizon. We are optimizing, but without any understanding any longer for the outcomes.

    "The point where we are now blind as to where we are going. The outcomes become increasingly unpredictable, and it becomes less likely that we can find our way back as it becomes a technology trap. Our existence becomes dependent on the very technology that is broken, fragile, unpredictable, and no longer understandable. There is just as much uncertainty in attempting to retrace our steps as there is in going forward."

  8. > AI is not inevitable fate. It is an invitation to wake up. The work is to keep dragging what is singular, poetic, and profoundly alive back into focus, despite all pressures to automate it away.

    This is the struggle. The race to automate everything. Turn all of our social interactions into algorithmic digital bits. However, I don't think people are just going to wake up from calls to wake up, unfortunately.

    We typically only wake up to anything once it is broken. Society has to break from the over optimization of attention and engagement. Not sure how that is going to play out, but we certainly aren't slowing down yet.

    For example, take a look at the short clip I have posted here. It is an example of just how far everyone is scaling bot and content farms. It is an absolute flood of noise into all of our knowledge repositories. https://www.mindprison.cc/p/dead-internet-at-scale

  9. > that the author also tripped over

    The evidence for unfaithful reasoning comes from Anthropic. It is in their system card and this Anthropic paper.

    https://assets.anthropic.com/m/71876fabef0f0ed4/original/rea...

  10. But it is not an illusion, and the answers make no sense. In some cases the models pick exactly the opposite answer. No human would do this.

    Yes, outside the training patterns is the point. I have no doubt if you trained LLMs on this type of pattern with millions of examples it could get the answers reliably.

    The whole point is that humans do not need data training. They understand such concepts from one example.

  11. Take a look at this vision test - https://www.mindprison.cc/i/143785200/the-impossible-llm-vis...

    It is an example that shows the difference between understanding and patterns. No model actually understands the most fundamental concept of length.

    LLMs can seem to do almost anything for which there are sufficient patterns to train on. However, there aren't infinite patterns available to train on. So, edge cases are everywhere. Such as this one.

  12. The bar you asked for was "meaningful progress". And as you state, "both are very helpful metrics", it seems the bar is met to the degree it can be.

    I don't think we will see a definitive test as we can't even precisely define it. Other than heuristic signals such as stated above, the only thing left is just observing performance in the real world. But I think the current progress as measured by "benchmarks" is terribly flawed.

  13. I think it is actually worse than that. The hype labs are still defiantly trying to convince us that somehow merely scaling statistics will lead to the emergence of true intelligence. They haven't reached the point of being "surprised" as of yet.
  14. > most people can’t reliably interpret the meaning of complex or unfamiliar text

    But LLMs fail the most basic tests of understanding that don't require complexity. They have read everything that exists. What would even be considered unfamiliar in that context?

    > RFK Jr. is antivax because he misunderstands all the information he sees about the benefits of vaccines.

    These are areas where information can be contradictory. Even this statement is questionable in its most literal interpretation. Has he made such a statement? Is that a correct interpretation of his position?

    The errors we are criticizing in LLMs are not areas of conflicting information or difficult to discern truths. We are told LLMs are operating at PhD level. Yet, when asked to perform simpler everyday tasks, they often fail in ways no human normally would.

  15. We are capable of much more, which is why we can perform tasks when no prior pattern or example has been provided.

    We can understand concepts from the rules. LLMs must train on millions of examples. A human can play a game of chess from reading the instruction manual without ever witnessing a single game. This is distinctly different than pattern matching AI.

  16. Essentially, pattern matching can outperform humans at many tasks. Just as computers and calculators can outperform humans at tasks.

    So it is not that LLMs can't be better at tasks, it is that they have specific limits that are hard to discern as pattern matching on the entire world of data is kind of an opaque tool in which we can not easily perceive where the walls are and it falls completely off the rails.

    Since it is not true intelligence, but a good mimic at times, we will continue to struggle with unexpected failures as it just doesn't have understanding for the task given.

This user hasn’t submitted anything.

Keyboard Shortcuts

Story Lists

j
Next story
k
Previous story
Shift+j
Last story
Shift+k
First story
o Enter
Go to story URL
c
Go to comments
u
Go to author

Navigation

Shift+t
Go to top stories
Shift+n
Go to new stories
Shift+b
Go to best stories
Shift+a
Go to Ask HN
Shift+s
Go to Show HN

Miscellaneous

?
Show this modal