Comment by NiloCK - Hacker Neue

NiloCK Nov 6, 2025 parent

Maybe a dumb question but: what is a "reasoning model"?

I think I get that "reasoning" in this context refers to dynamically budgeting scratchpad tokens that aren't intended as the main response body. But can't any model do that, and it's just part of the system prompt, or more generally, the conversation scaffold that is being written to.

Or does a "reasoning model" specifically refer to models whose "post training" / "fine tuning" / "rlhf" laps have been run against those sorts of prompts rather than simpler user-assistant-user-assistant back and forths?

EG, a base model becomes "a reasoning model" after so much experience in the reasoning mines.

rcxdude Nov 6, 2025

The latter. A reasoning model has been finetuned to use the scratchpad for intermediate results (which works better than just prompting a model to do the same).

NiloCK OP Nov 6, 2025

I'd expect the same (fine tuning to be better than mere prompting) for most anything.

So a model is or is not "a reasoning model" according to the extent of a fine tune.

Are there specific benchmarks that compare models vs themselves with and without scratchpads? High with:without ratios being reasonier models?

Curious also how much a generalist model's one-shot responses degrade with reasoning post-training.

bigyabai Nov 6, 2025

> Are there specific benchmarks that compare models vs themselves with and without scratchpads?

Yep, it's pretty common for many models to release an instruction-tuned and thinking-tuned model and then bench them against each other. For instance, if you scroll down to "Pure text performance" there's a comparison of these two Qwen models' performance: https://huggingface.co/Qwen/Qwen3-VL-30B-A3B-Thinking

walthamstow Nov 6, 2025

Thanks for the Qwen tip. Interesting how much of a difference reasoning makes for coding.

robkop Nov 7, 2025

> Are there specific benchmarks that compare models vs themselves with and without scratchpads? High with:without ratios being reasonier models?

Yes, simplest example: https://www.anthropic.com/engineering/claude-think-tool

dcre Nov 7, 2025

The question is: fine-tuning for what? Reasoning is not a particular task, it is a general-purpose technique for directing more compute at any task.

irthomasthomas Nov 7, 2025

Pivot tokens like 'wait', 'actually' and 'alternatively' are boosted in order to force the model to explore alternate solutions.

nodja Nov 6, 2025

Any model that does thinking inside <think></think> style tokens before it answers.

This can be done with finetuning/RL using an existing pre-formatted dataset, or format based RL where the model is rewarded for both answering correct and using the right format.

This item has no comments currently.

Preferences

Keyboard Shortcuts

Story Lists

Navigation

Miscellaneous