Preferences

Are you insinuating Gemini is similar in performance to o3-mini?

panarky
I've only had o3-mini for a day, but Gemini 2.0 Flash Thinking is still clearly better for my use cases.

And it's currently free in aistudio.google.com and in the API.

And it handles a million tokens.

Definitely varies by application, but the blind "taste test" vibes are very good for Gemini: https://lmarena.ai/?leaderboard
anabab
that reminds me that a week ago there was a (now deleted but has a copy of the content available in the comments) post on Reddit where the author claimed they have attempted manipulating/manipulated voting on lmarena in favor of Gemini to tip the scale on Polymarket where on a question like "which AI model will be the best one by $date" (with the outcome decided based on the scoring on lmarena) they have supposedly made O(USD10k).

Original deleted post: https://old.reddit.com/r/MachineLearning/comments/1i83mhj/lm...

A copy of the content: https://old.reddit.com/r/MachineLearning/comments/1i83mhj/lm...

gerdesj
Are you implying it isn't?

(evidence please, everyone)

BinRoo OP
Simple example: o3-mini-high gets this [1] right, whereas Gemini 2.0 Flash 01-21 gets it wrong.

[1] https://chatgpt.com/share/679d9579-5bb8-8008-ac4a-38cef65b45...

Great example. Thank you. Can confirm that none of the Gemini models warned about the exception without prompting.
This agrees with my limited testing so far, but in a different way: o3 being better at coding and objective tasks, with the most recent Flash 2.0-thinking stronger at subjective tasks. Similarly, o3 seems better at shorter output sizes, but drops off, tending to be lazy.

This item has no comments currently.