Comment by BinRoo - Hacker Neue

BinRoo Feb 1, 2025 parent

Are you insinuating Gemini is similar in performance to o3-mini?

panarky Feb 1, 2025

I've only had o3-mini for a day, but Gemini 2.0 Flash Thinking is still clearly better for my use cases.

And it's currently free in aistudio.google.com and in the API.

And it handles a million tokens.

xnx Feb 1, 2025

Definitely varies by application, but the blind "taste test" vibes are very good for Gemini: https://lmarena.ai/?leaderboard

anabab Feb 1, 2025

that reminds me that a week ago there was a (now deleted but has a copy of the content available in the comments) post on Reddit where the author claimed they have attempted manipulating/manipulated voting on lmarena in favor of Gemini to tip the scale on Polymarket where on a question like "which AI model will be the best one by $date" (with the outcome decided based on the scoring on lmarena) they have supposedly made O(USD10k).

Original deleted post: https://old.reddit.com/r/MachineLearning/comments/1i83mhj/lm...

A copy of the content: https://old.reddit.com/r/MachineLearning/comments/1i83mhj/lm...

gerdesj Feb 1, 2025

Are you implying it isn't?

(evidence please, everyone)

BinRoo OP Feb 1, 2025

Simple example: o3-mini-high gets this [1] right, whereas Gemini 2.0 Flash 01-21 gets it wrong.

[1] https://chatgpt.com/share/679d9579-5bb8-8008-ac4a-38cef65b45...

xnx Feb 1, 2025

Great example. Thank you. Can confirm that none of the Gemini models warned about the exception without prompting.

maeil Feb 1, 2025

This agrees with my limited testing so far, but in a different way: o3 being better at coding and objective tasks, with the most recent Flash 2.0-thinking stronger at subjective tasks. Similarly, o3 seems better at shorter output sizes, but drops off, tending to be lazy.

This item has no comments currently.