BinRoo parent
Are you insinuating Gemini is similar in performance to o3-mini?
I've only had o3-mini for a day, but Gemini 2.0 Flash Thinking is still clearly better for my use cases.
And it's currently free in aistudio.google.com and in the API.
And it handles a million tokens.
Definitely varies by application, but the blind "taste test" vibes are very good for Gemini: https://lmarena.ai/?leaderboard
that reminds me that a week ago there was a (now deleted but has a copy of the content available in the comments) post on Reddit where the author claimed they have attempted manipulating/manipulated voting on lmarena in favor of Gemini to tip the scale on Polymarket where on a question like "which AI model will be the best one by $date" (with the outcome decided based on the scoring on lmarena) they have supposedly made O(USD10k).
Original deleted post: https://old.reddit.com/r/MachineLearning/comments/1i83mhj/lm...
A copy of the content: https://old.reddit.com/r/MachineLearning/comments/1i83mhj/lm...
Are you implying it isn't?
(evidence please, everyone)
Simple example: o3-mini-high gets this [1] right, whereas Gemini 2.0 Flash 01-21 gets it wrong.
[1] https://chatgpt.com/share/679d9579-5bb8-8008-ac4a-38cef65b45...
Great example. Thank you. Can confirm that none of the Gemini models warned about the exception without prompting.