Comment by youreth4tguy

youreth4tguy Feb 1, 2025 parent

I notice that I trust LLMs a lot nowadays, because I spent a lot of time trying to understand the answer to the first question, thinking I was just "not getting it", until realizing that it's just wrong, and then also realizing that the question doesn't even make sense in the slightest.

mmiyer Feb 1, 2025

The question is perfectly valid -https://math.stackexchange.com/questions/60050/find-a-functi...

The paper notes GPT 4 can solve it (they seemed to have asked ChatGPT 3.5 - this paper is old by AI standards, the first version being from Dec 2023).

lanstin Feb 1, 2025

Yeah they aren't that good at spotting wrong questions (tho better than a year ago). Claude is especially likely to do this correctly. GPT o whatever will do this push back wrongly. Something in Gemini is positioning Gemini as an all knowing expert rather than a tool for exploring new true statements. It ends almost every thing with "is there anything else I can explain about the distribution of algebraic numbers with a given height".

lanstin Feb 1, 2025

Actually I just did one of my test questions on GPT o3-mini-high and it got it. Very nice. Back in the lead over Claude. (My last check was with o1; although if they used the whole chat history to train or fine tune I gave the answer and in the end forced o1 to accept it. Lot of arm twisting language tho.)

This item has no comments currently.