Preferences

> In LLMs, we will have bigger weights vs test-time compute tradeoffs. A smaller model can get "there" but it will take longer.

Assuming both are SOTA, a smaller model can't produce the same results as a larger model by giving it infinite time. Larger models inherently have more room for training more information into the model.

No amount of test-retry cycle can overcome all of those limits. The smaller models will just go in circles.

I even get the larger hosted models stuck chasing their own tail and going in circles all the time.


It's true that to train more information into the model you need more trainable parameters, but when people ask for small models, they usually mean models that run at acceptable speeds on their hardware. Techniques like mixture-of-experts allow increasing the number of trainable parameters without requiring more FLOPs, so they're large in one sense but small in another.

And you don't necessarily need to train all information into the model, you can also use tool calls to inject it into the context. A small model that can make lots of tool calls and process the resulting large context could obtain the same answer that a larger model would pull directly out of its weights.

> No amount of test-retry cycle can overcome all of those limits. The smaller models will just go in circles.

That's speculative at this point. In the context of agents with external memory, this isn't so clear.

Almost all training data are on the internet. As long as the small model has enough agentic browsing ability, given it enough time it will retrieve the data from the internet.

This item has no comments currently.

Keyboard Shortcuts

Story Lists

j
Next story
k
Previous story
Shift+j
Last story
Shift+k
First story
o Enter
Go to story URL
c
Go to comments
u
Go to author

Navigation

Shift+t
Go to top stories
Shift+n
Go to new stories
Shift+b
Go to best stories
Shift+a
Go to Ask HN
Shift+s
Go to Show HN

Miscellaneous

?
Show this modal