Preferences

It's not even "nearly as good as o1". They only compared to the older 4o.

You can safely assume Qwen2.5-Max will score worse than all of the recent reasoning models (o1, DeepSeek-R1, Gemini 2.0 Flash Thinking).

It'll probably become a very strong model if/when they apply RL training for reasoning. However, all the successful recipes for this are closed source, so it may take some time. They could do SFT based on another model's reasoning chains in the meantime, though the DeepSeek-R1 technical report noted that it's not as good as RL training.


This item has no comments currently.

Keyboard Shortcuts

Story Lists

j
Next story
k
Previous story
Shift+j
Last story
Shift+k
First story
o Enter
Go to story URL
c
Go to comments
u
Go to author

Navigation

Shift+t
Go to top stories
Shift+n
Go to new stories
Shift+b
Go to best stories
Shift+a
Go to Ask HN
Shift+s
Go to Show HN

Miscellaneous

?
Show this modal