Preferences

crystal_revenge
Joined 2,341 karma

  1. If you think this is simple, wait until you learn about oceans and forests do!

    Trees are literally CO2 based solar batteries: they take CO2 + solar energy and store it as hydrocarbons and carbohydrates for later use. Every time you're sitting by a campfire you're feeling heat from solar energy. How much better does it get that free energy storage combined with CO2 scrubbing from the atmosphere!

    When you look at the ocean, it's able to absorb 20-30% of all human caused CO2 emissions all with no effort on our behalf.

    Unfortunately, these two solution are, apparently, "too good to be true" because we're increasingly reducing the ability of both to remove carbon. Parts of the Amazon are not net emitters of CO2 [0] and the ocean has limits to how much CO2 it can absorb before it starts reach its limit and become dangerously too acidic for ocean life.

    0. https://www.theguardian.com/environment/2021/jul/14/amazon-r...

  2. > "reasoning" problem which simply does not exist/happen when using structured generation

    The first article demonstrates exactly how to implement structured generation with CoT. Do you mean “reasoning” other than traditional CoT (like DeepSeek)? I’ll have to look for an reference but I recall the Outlines team also handling this latter case.

  3. (repeating an earlier comment). The team behind Outlines has repeatedly provided evaluations that show constrained decoding improves the outputs:

    - https://blog.dottxt.ai/performance-gsm8k.html

    - https://blog.dottxt.ai/oss-v-gpt4.html

    - https://blog.dottxt.ai/say-what-you-mean.html

  4. > but never actually seen concrete evals.

    The team behind the Outlines library has produced several sets of evals and repeatedly shown the opposite: that constrained decoding improves model performance (including examples of "CoT" which the post claims isn't possible). [0,1]

    There was a paper that claimed constrained decoding hurt performance, but it had some fundamental errors which they also wrote about [2].

    People get weirdly superstitious when it comes to constrained decoding as though t somehow "limiting the model" when it's just a simple as applying a conditional probably distribution to the logits. I also suspect this post is largely to justify the fact that BAML parses the results (since the post is written by them).

    0. https://blog.dottxt.ai/performance-gsm8k.html

    1. https://blog.dottxt.ai/oss-v-gpt4.html

    2. https://blog.dottxt.ai/say-what-you-mean.html

  5. > so they can't do anything useful

    I never claimed that. They demonstrate just how powerful Markov chains can be with sophisticated state representations. Obviously LLMs are useful, I have never claimed otherwise.

    Additionally, it doesn’t require any logical leaps to understand decoder only LLMs as Markov Chains, they preserve the Markov Property and otherwise be have exactly like them. It’s worth noting that encoder-decoder LLMs do not preserve the Markov property and can not be considered Markov chains.

    Edit: I saw that post and at the time was disappointed by how confused the author was about those topics and how they apply to the subject.

  6. You could model a specific instance of using your computer this way, but you could not capture the fact that you can execute arbitrary programs with your PC represented as an FSM.

    Your computer is strictly more computationally powerful than an FSM or PDA, even though you could represent particular states of your computer this way.

    The fact that you can model an arbitrary CFG as an regular language with limited recursion depth does not mean there’s no meaningful distinction between regular languages and CFG.

  7. Only if the computer you use does not have memory. Definitionally if you are writing and reading from memory, you are not using an FSM.
  8. Strongly agree with this comment. Decoder-only LLMs (the ones we use) are literally Markov Chains, the only (and major) difference is a radically more sophisticated state representation. Maybe "stochastic parrot" is overly dismissive sounding, but it's not a fundamentally wrong understanding of LLMs.

    The RL claims are also odd because, for starters, RLHF is not "reinforcement learning" based on any classical definition of RL (which almost always involve an online component). And further, you can chat with anyone who has kept up with the RL field, and quickly realize that this is also a technology that still hasn't quite delivered on the promises it's been making (despite being an incredibly interesting area of research). There's no reason to speculate that RL techniques will work with "agents" where they have failed to achieve wide spread success in similar domains.

    I continue to be confused why smart, very technical people can't just talk about LLMs honestly. I personally think we'd have much more progress if we could have conversations like "Wow! The performance of a Markov Chain with proper state representation is incredible, let's understand this better..." rather than "AI is reasoning intelligently!"

    I get why non-technical people get caught up in AI hype discussions, but for technical people that understand LLMs it seems counter productive. Even more surprising to me is that this hype has completely destroyed any serious discussions of the technology and how to use it. There's so much oppurtunity lost around practical uses of incorporating LLMs into software while people wait for agents to create mountains of slop.

  9. I wish people would be more vocal in calling out that LLMs have unquestionably failed to deliver on the 2022-2023 promises of exponential improvement at the foundation model level. Yes they have improved, and there is more tooling around them, but clearly the difference between LLMs in 2025 and 2023 is not as large as 2023 and 2021. If there was truly exponential progress, there would be no possibility of debating this. Which makes comments like this:

    > The fundamental challenge in AI for the next 20 years is avoiding extinction.

    Seem to be almost absurd without further, concrete justification.

    LLMs are still quite useful, I'm glad they exist and honestly am still surprised more people don't use them in software. Last year I was very optimistic that LLMs would entirely change how we write software by making use of them as a fundamental part of our programming tool kit (in a similar way that ML fundamentally changed the options available to programmers for solving problems). Instead we've just come up with more expensive ways to extend the chat metaphor (the current generation of "agents" is disappointingly far from the original intent of agents in AI/CS).

    The thing I am increasingly confused about is why so many people continue to need LLMs to be more than they obviously are. I get why crypto boosters exist, if I have 100 BTC, I have a very clear interest getting others to believe that they are valuable. But with "AI", I don't quite get, for the non-VS/founder, why it matters that people start foaming out the mouth over AI rather than just using it for the things it's good at.

    Though I have some growing sense that this need is related to another trend I've personally started with witness: AI psychosis is very real. I personally know an increasing number of people who are spiraling into an LLM induced hallucinated world. The most shocking was someone talking about how losing human relationships is inevitable because most people can't keep up with those enhanced by AI acceleration. On the softer end I know more and more people who quietly confess how much they let AI work as a perpetual therapist, guiding their every decision (which is more than most people would let a human therapist guide there directions).

  10. As someone who has been on Twitter since 2007, it’s radically changed in the last few years to the point of being unrecognizable.
  11. I’m pretty sure step one to going back to “the old way” is not to ask ChatGPT
  12. I've mainly been using Sonnet 4.5 so decided to give Opus 4.5 a whirl to see if could solve an annoying task I've been working on that Sonnet 4.5 absolutely fails on. Just started with "Are you familiar with <task> and can you help me?" and so far the response has been a resounding:

    > Taking longer than usual. Trying again shortly (attempt 1 of 10)

    > ...

    > Taking longer than usual. Trying again shortly (attempt 10 of 10)

    > Due to unexpected capacity constraints, Claude is unable to respond to your message. Please try again soon.

    I guess I'll have to wait until later to feel the fear...

  13. > Because LLMs make it that much faster to develop software

    I feel as though "facts" such as this are presented to me all the time on HN, but in my every day job I encounter devs creating piles of slop that even the most die-hard AI enthusiasts in my office can't stand and have started to push against.

    I know, I know "they just don't know how to use LLMs the right way!!!", but all of the better engineers I know, the ones capable of quickly assessing the output of an LLM, tend to use LLMs much more sparingly in their code. Meanwhile the ones that never really understood software that well in the first place are the ones building agent-based Rube Goldberg machines that ultimately slow everyone down

    If we can continue living in the this AI hallucination for 5 more years, I think the only people capable of producing anything of use or value will be devs that continued to devote some of their free time to coding in languages like Gleam, and continued to maintain and sharpen their ability to understand and reason about code.

  14. I've worked in similar scenarios and advocated for the Safari subscription. The most obvious problem with the physical book solution is that not everyone can read the same book at the same time. In my experience it's very common that, because some topic is particularly relevant for the team, many people will want to read the same book. At the same time, you do not want 30 copies of a book that was read by everyone 3 years ago sitting on the shelf.

    And, as far as expenses go for a research institute, $4k/mo is very inexpensive.

  15. I used to work very heavily with local models and swore by text completion despite many people thinking it was insane that I would choose not to use a chat interface.

    LLMs are designed for text completion and the chat interface is basically a fine-tuning hack to make prompting a natural form of text completion to have a more "intuitive" interface for the average user (I don't even want to think about how many AI "enthusiasts" don't really understand this).

    But with open/local models in particular: each instruct/chat interface is slightly different. There are tools that help mitigate this, but the more you're working closely to the model the more likely you are to make a stupid mistake because you didn't understand some detail about how the instruct interface was fine tuned.

    Once you accept that LLMs are "auto-complete on steroids" you can get much better results by programming the way they were naturally designed to work. It also helps a lot with prompt engineering because you can more easily understand what the models natural tendency is and work with that to generally get better results.

    It's funny because a good chunk of my comments on HN these days are combating AI hype, but man are LLMs really fascinating to work with if you approach them with a bit more clear headed of a perspective.

  16. Right, you're describing sampling a single token which is equivalent to sampling from one step in the Markov Chain. When generating output you're repeating this process and updating your state sequentially which is the definition of the Markov Chain since at each state the embedding (which represents our current state) is conditionally independent of the past.

    Every response from an LLM is essentially the sampling of a Markov chain.

  17. I'm sure you're already familiar with the ELIZA effect [0], but you should be a bit skeptical of what you are seeing with your eyes, especially when it comes to language. Humans have an incredible weakness to be tricked by language.

    You should be doubly skeptically ever since RLHF has become standard as the model has literally been optimized to give you answers you find most pleasing.

    The best way to measure of course is with evaluations, and I have done professional LLM model evaluation work for about 2 years. I've seen (and written) tons of evals and they both impress me and inform my skepticism about the limitations of LLMs. I've also seen countless times where people are convinced "with their eyes" they've found a prompt trick that improves the results, only to be shown that this doesn't pan out when run on a full eval suite.

    As an aside: What's fascinating is that it seems our visual system is much more skeptical, an eyeball being slightly off created by a diffusion model will immediately set off alarms where enough clever word play from an LLM will make us drop our guard.

    0. https://en.wikipedia.org/wiki/ELIZA_effect

  18. I think you're confusing the sampling process and the convergence of those samples with the warmup process (also called 'burn-in') in HMC. When doing HMC MCMC we typically don't start sampling right away (or, more precisely we throw out those samples) because we may be initializing the sampler in a part of the distribution that involves pretty low probability density. After the chain has run awhile it tends to end up sampling from the typical set which, especially in high dimensional distribution, tends to more correctly represent the distribution we actually want to integrate over.

    So for language when I say "Bob has three apples, Jane gives him four and Judy takes two how many apples does Bob have" we're actually pretty far from the part of the linguistic manifold where the correct answer is likely to be. As the chain wanders this space it's getting closer until it finally statistically follow the path "this answer is..." and when it's sampling from this path it's in a much more likely neighborhood of the correct answer. That is, after wandering a bit, more and more of the possible paths are closer to where the actual answer lies than they would be if we had just forced the model to choose early.

    edit: Michael Betancourt has great introduction to HMC which covers warm-up and the typical set https://arxiv.org/pdf/1701.02434 (he has a ton more content that dives much more deeply into the specifics)

This user hasn’t submitted anything.