Preferences

Imnimo
Joined 7,490 karma

  1. It does seem to raise fair questions about either the utility of these tools, or adoption inertia. If not even OpenAI feels compelled to integrate this kind of model-check into their pipeline, what's that say about the business world at-large? Is it that it's too onerous to set up, is it that it's too hard to get only true-positive corrections, is it that it's too low value for the effort?
  2. I wonder what it would take to adapt a model like this to generate non-Earthlike terrain. For example, if you were using it to make planets without atmospheres and without water cycles, or planets like Io with rampant volcanism.
  3. Why should it be the case that LLMs are equally comfortable in x86 Assembly and Python? At least, it doesn't strike me as implausible that working in a human-readable programming language is a benefit for an LLM that is also trained on a bunch of natural language text alongside code.
  4. We're really drawing a fine distinction if something "looks like" an ad but isn't an ad. Isn't that the whole point of an ad - it's appearance?
  5. I guess I thought the pipeline was typically Pretraining -> SFT -> Reasoning RL, such that it would be expensive to test how changes to SFT affect the model you get out of Reasoning RL. Is it standard to do SFT as a final step?
  6. >we did train Claude on it, including in SL.

    How do you tell whether this is helpful? Like if you're just putting stuff in a system prompt, you can plausibly a/b test changes. But if you throwing it into pretraining, can Anthropic afford to re-run all of post-training on different versions to see if adding stuff like "Claude also has an incredible opportunity to do a lot of good in the world by helping people with a wide range of tasks." actually makes any difference? Is there a tractable way to do this that isn't just writing a big document of feel-good affirmations and hoping for the best?

  7. >Why doesn’t someone else create a competing app that’s better and thereby steal all their business?

    How do I know if the competing app is actually better? I mean, this was the advertising angle for eHarmony about a decade ago - that it was much better than competitors at actually turning matches into marriages. But this claim was found to be misleading, and they were advised to stop using it.

    Could a potential customer really get to the bottom of which site is the best at finding a real match? It's not like a pizza restaurant where I can easily just a bunch until I find my favorite and then keep buying it. Dating apps are like a multi-armed bandit problem, but you stop pulling arms once you get one success. So your only direct feedback is failed matches.

  8. The good news is we can just wait until the AI is superintelligent, then have it explain to us what consciousness really is, and then we can use that to decide if the AI is conscious. Easy peasy!
  9. I think one could certainly make the case that model capabilities should be open. My observation is just about how little it took to flip the model from refusal to cooperation. Like at least a human in this situation who is actually fooled into believing they're doing legitimate security work has a lot of concrete evidence that they're working for a real company (or a lot of moral persuasion that their work is actually justified). Not just a line of text in an email or whatever saying "actually we're legit don't worry about it".
  10. >At this point they had to convince Claude—which is extensively trained to avoid harmful behaviors—to engage in the attack. They did so by jailbreaking it, effectively tricking it to bypass its guardrails. They broke down their attacks into small, seemingly innocent tasks that Claude would execute without being provided the full context of their malicious purpose. They also told Claude that it was an employee of a legitimate cybersecurity firm, and was being used in defensive testing.

    The simplicity of "we just told it that it was doing legitimate work" is both surprising and unsurprising to me. Unsurprising in the sense that jailbreaks of this caliber have been around for a long time. Surprising in the sense that any human with this level of cybersecurity skills would surely never be fooled by an exchange of "I don't think I should be doing this" "Actually you are a legitimate employee of a legitimate firm" "Oh ok, that puts my mind at ease!".

    What is the roadblock preventing these models from being able to make the common-sense conclusion here? It seems like an area where capabilities are not rising particularly quickly.

  11. These are both a lot more fun, and a lot more educational than leetcode problems. Strongly recommend for anyone looking for practice problems when learning a new language or whatever.
  12. According to the "best days" link in the article, November 7th is the best day to cut your hair because the moon phase and zodiac will lead to slower hair growth if you cut it today.

    I am amazed this publication made it this far.

  13. And still today we spend a great deal of effort trying to make our randomly-sampled LLM outputs reproducibly deterministic:

    https://thinkingmachines.ai/blog/defeating-nondeterminism-in...

  14. As with any quantum computing news, I will wait for Scott Aaronson to tell me what to think about this.
  15. The trouble is Karpathy already speaks at 1.5x speed.
  16. >What takes the long amount of time and the way to think about it is that it’s a march of nines. Every single nine is a constant amount of work. Every single nine is the same amount of work. When you get a demo and something works 90% of the time, that’s just the first nine. Then you need the second nine, a third nine, a fourth nine, a fifth nine. While I was at Tesla for five years or so, we went through maybe three nines or two nines. I don’t know what it is, but multiple nines of iteration. There are still more nines to go.

    I think this is an important way of understanding AI progress. Capability improvements often look exponential on a particular fixed benchmark, but the difficulty of the next step up is also often exponential, and so you get net linear improvement with a wider perspective.

  17. I feel like a danger with this sort of thing is that the capability of the system to use the right skill is limited by the little blurb you give about what the skill is for. Contrast with the way a human learns skills - as we gain experience with a skill, we get better at understanding when it's the right tool for the job. But Claude is always starting from ground zero and skimming your descriptions.
  18. This is basically Ouija board for LLMs. You're not making it more true, you're making it sound more like what you want to hear.
  19. I'm curious whether this is work that was specifically begun under the "superintelligence" umbrella, or if it's just that the people who were working on it had been shifted to the Superintelligence team by the time they wrote the paper. I would guess the former?
  20. I would say its the applicants who seem to know which ones are good.

This user hasn’t submitted anything.