Preferences

mediaman
Joined 5,609 karma

  1. RLVR is not offline learning. It's not learning from a static dataset. These are live rollouts that are being verified and which update the weights at each pass based on feedback from the environment.

    You might argue that traditional RL involves multiple states the agent moves through. But autoregressive LLMs are the same: a forward pass generating a token also creates change in state.

    After training, the weights are fixed, of course, but that is the case of most traditional RL systems. RL does not intrinsically mean a continual updating of weights in deployment, which carries a bunch of problems.

    From the premise that RLVR can be applied to benchmaxx (true!) it does not follow that it therefore is only good for that.

  2. I don't understand this. In supabase, the default is to turn on RLS for new tables. If you turn it on and have no policy set, no user can fetch anything from the table.

    You have to explicitly create a read-all policy for anon keys, and with no constraints, for people to get access to it.

    The default is secure.

    If you turn off RLS, there are warnings everywhere that the table is unsecured.

    The author goes on to compare this with PocketBase, which he says you "have to go out of your way" to make insecure. You have to go out of your way with Supabase, as well!

    I wonder if the author tested this? I do agree that some third party website builders who use supabase on the back end could have created insecure defaults, but that's not supabase's fault.

  3. Much of this is due to vastly better posttraining RL, not models that are much bigger. The idea that most of these gains comes from training really big models, or throwing immensely larger amounts of compute at it, is not really true.
  4. This is a regurgitation of the old critique of history: what's it's purpose? What do you use it for? What is its application?

    One answer is that the study of history helps us understand that what we believe as "obviously correct" views today are as contingent on our current social norms and power structures (and their history) as the "obviously correct" views and beliefs of some point in the past.

    It's hard for most people to view two different mutually exclusive moral views as both "obviously correct," because we are made of a milieu that only accepts one of them as correct.

    We look back at some point in history, and say, well, they believed these things because they were uninformed. They hadn't yet made certain discoveries, or had not yet evolved morally in some way; they had not yet witnessed the power of the atomic bomb, the horrors of chemical warfare, women's suffrage, organized labor, or widespread antibiotics and the fall of extreme infant mortality.

    An LLM trained on that history - without interference from the subsequent actual path of history - gives us an interactive compression of the views from a specific point in history without the subsequent coloring by the actual events of history.

    In that sense - if you believe there is any redeeming value to history at all; perhaps you do not - this is an excellent project! It's not perfect (it is only built from writings, not what people actually said) but we have no other available mass compression of the social norms of a specific time, untainted by the views of subsequent interpreters.

  5. If your position is that brains are not actually bound by the laws of physics -- that they operate on some other plane of existence unbound by any scientifically tested principle -- then it is not only your ideological opposites who have quasi-religious faith in a thing not fully comprehended.
  6. Read the full post. Partway down you will see they agree with you that getting an API key is not hard.

    Paying is hard. And it is confusing how to set it up: you have to create a Vertex billing account and go through a cumbersome process to then connect your AIStudio to it and bring over a "project" which then disconnects all the time and which you have to re-select to use Nano Banana Pro or Gemini 3. It's a very bad process.

    It's easy to miss this because they are very generous with the free tier, but Gemini 3 is not free.

  7. A person who chooses to evaluate all ideas by way of their source tells me all I need to know about their opinion.
  8. Serving a subpoena in this manner is for publicity, not process.

    In this case the public defender is issuing a subpoena for records in relation to the trespassing case. This is in relation to OpenAI, not Sam Altman. They could serve any reasonably senior person in the company. It can also be done by certified mail.

  9. Cultural antibodies take a long time to develop. In twenty years you will see more common resistance to what's being produced today, but less to whatever new innovation is released then.

    See, for example, the slowly declining efficacy of banner ads, as each cohort of computer user learned to ignore them but they still retained efficacy on newer vintages of users.

  10. We've had a great experience with JSONata too.

    It's nice because we can just put the JSONata expression into a db field, and so you can have arbitrary data transforms for different customers for different data structures coming or going, and they can be set up just by editing the expression via the site, without having to worry about sandboxing it (other than resource exhaustion for recursive loops). It really sped up the iteration process for configuring transforms.

  11. That is a common refrain by people who have no domain expertise in anything outside of tech.

    Spend a few years in an insurance company, a manufacturing plant, or a hospital, and then the assertion that the frontier labs will figure it out appears patently absurd. (After all, it takes humans years to understand just a part of these institutions, and they have good-functioning memory.)

    This belief that tier 5 is useless is itself a tell of a vulnerability: the LLMs are advancing fastest in domain-expertise-free generalized technical knowledge; if you have no domain expertise outside of tech, you are most vulnerable to their march of capability, and it is those with domain expertise who will rely increasingly less on those who have nothing to offer but generalized technical knowledge.

  12. I both read a fair amount (and long books, 800-1,000 page classic Russian novels, that kind of thing) and use LLMs.

    I quite like using LLMs to learn new things. But I agree: I can't stand reading blog posts written by LLMs. Perhaps it is about expectations. A blog post I am expecting to gain a view into an individual's thinking; for an AI, I am looking into an abyss of whirring matrix-shaped gears.

    There's nothing wrong with the abyss of matrices, but if I'm at a party and start talking with someone, and get the whirring sound of gears instead of the expected human banter, I'm a little disturbed. And it feels the same for blog content: these are personal communications; machines have their place and their use, but if I get a machine when I'm expecting something personal, it counters expectations.

  13. We do this simply by injecting a company-defined list of proper names/terms into the prompt, within <special_terms>, and telling it to use that information to assist with spelling. It works pretty well.
  14. He is telling you how it mechanically works. Your comment about it “understanding what that means” because it is an NLP seems bizarre, but maybe you mean it in some other way.

    Are you proposing that the attention input context is gone, or that the attention mechanism’s context cost is computationally negated in some way, simply because the system processes natural language? Having the attention mechanism selectively isolate context on command would be an important technical discovery.

  15. Bread is a great example! You can buy a loaf for $3-4. It is not a 50x markup. Like growing your own veggies, baking bread is for fun, not for economics.

    But the cloud is different. None of the financial scale benefits are passed on to you. You save serious money running it in-house. The arguments around scale have no validity for the vast, vast majority of use cases.

    Vercel isn't selling bread: they're selling a fancy steak dinner, and yes, you can make steak at home for much less, and if you eat fancy steak dinners at fancy restaurants every night you're going to go broke.

    So the key is to understand whether your vendors are selling you bread, or a fancy steak dinner, and to not make the mistake of getting the two confused.

  16. The point about synthetic query generation is good. We found users had very poor queries, so we initially had the LLM generate synthetic queries. But then we found that the results could vary widely based on the specific synthetic query it generated, so we had it create three variants (all in one LLM call, so that you can prompt it to generate a wide variety, instead of getting three very similar ones back), do parallel search, and then use reciprocal rank fusion to combine the list into a set of broadly strong performers. For the searches we use hybrid dense + sparse bm25, since dense doesn't work well for technical words.

    This, combined with a subsequent reranker, basically eliminated any of our issues on search.

  17. I have a lot of success asking models such as Gemini to OCR the text, and then to describe any images on the document, including charts. I have it format the sections with XML-ish tags. This also works for tables.
  18. RL works great on verifiable domains like math, and to some significant extent coding.

    Coding is an interesting example because as we change levels of abstraction from the syntax of a specific function to, say, the architecture of a software system, the ability to measure verifiable correctness declines. As a result, RL-tuned LLMs are better at creating syntactically correct functions but struggle as the abstraction layer increases.

    In other fields, it is very difficult to verify correctness. What is good art? Here, LLMs and their ilk can still produce good output, but it becomes hard to produce "superhuman" output, because in nonverifiable domains their capability is dependent on mimicry; it is RL that gives the AI the ability to perform at superhuman levels. With RL, rather than merely fitting its parameters to a set of extant data it can follow the scent of a ground truth signal of excellence. No scent, no outperformance.

  19. It's a false dichotomy. LLMs are already being trained with RL to have goal directedness.

    He is right that non-RL'd LLMs are just mimicry, but the field already moved beyond that.

  20. We're going in the opposite direction.

    High minimum wages are making it nearly impossible to run small restaurants. We're seeing restaurant apocalypse happening in places like Seattle and Denver, because high wages result in high prices which causes lower customer volume which causes higher fixed costs per unit, which causes a death spiral. Denver, for example, has 30% fewer restaurants now, because it's so hard to run one profitably.

    Lots of little tasty restaurants and high wages for service staff are basically incompatible. Many people in the city don't realize this and advocate for both.

    Of course, high wages are desirable if they can be accomplished without tradeoffs, but the tradeoffs are there.

    High minimum wages favor high-volume, fast service chain restaurants that are more labor efficient.

    Perhaps eventually automation will relax this tradeoff, but I would expect automation to primarily benefit corporate restaurant chains over small local eateries, unless the automation is so general that any restaurant can start using it without technical expertise or R&D.

This user hasn’t submitted anything.

Keyboard Shortcuts

Story Lists

j
Next story
k
Previous story
Shift+j
Last story
Shift+k
First story
o Enter
Go to story URL
c
Go to comments
u
Go to author

Navigation

Shift+t
Go to top stories
Shift+n
Go to new stories
Shift+b
Go to best stories
Shift+a
Go to Ask HN
Shift+s
Go to Show HN

Miscellaneous

?
Show this modal