youreth4tguy parent
I notice that I trust LLMs a lot nowadays, because I spent a lot of time trying to understand the answer to the first question, thinking I was just "not getting it", until realizing that it's just wrong, and then also realizing that the question doesn't even make sense in the slightest.
The question is perfectly valid -https://math.stackexchange.com/questions/60050/find-a-functi...
The paper notes GPT 4 can solve it (they seemed to have asked ChatGPT 3.5 - this paper is old by AI standards, the first version being from Dec 2023).
Yeah they aren't that good at spotting wrong questions (tho better than a year ago). Claude is especially likely to do this correctly. GPT o whatever will do this push back wrongly. Something in Gemini is positioning Gemini as an all knowing expert rather than a tool for exploring new true statements. It ends almost every thing with "is there anything else I can explain about the distribution of algebraic numbers with a given height".
Actually I just did one of my test questions on GPT o3-mini-high and it got it. Very nice. Back in the lead over Claude. (My last check was with o1; although if they used the whole chat history to train or fine tune I gave the answer and in the end forced o1 to accept it. Lot of arm twisting language tho.)