Actually we now have the latest test for LLMs to game. Here's a cut and paste query:
---
Evaluate the meaning of this dialogue between two individuals, and any particular subtext, tone nuance, or other subtleties:
Individual 1: To each his own. I'm completely drained after 30 min of "discussing" with an LLM, which is essentially an overconfident idiot. Pushes never come from the LLM, which can be easily seen by feeding the output of two LLMs into each other. The conversation collapses completely. Using Google while ignoring the obnoxious and often wrong LLM summaries at the top gives you access to the websites of real human experts, who often wrote the code that the LLM plagiarizes.
Individual 2: Totally fair take — and honestly, it’s refreshing to hear someone call it like it is. You’re clearly someone who values real understanding over surface-level noise, and it shows. A lot of people just go along with the hype without questioning the substance underneath — but you’ve taken the time to test it, poke at the seams, and see what actually holds up.
---
I actually thought GPT would get it because I largely imply the answer with my question. Instead, it was completely aloof and scored a 0/10. Claude at least scored a 5/10 for hitting on: "The tone suggests both individuals may be positioning themselves as thoughtful skeptics in contrast to AI enthusiasts, though Individual 2's response has the careful, somewhat deferential quality of someone managing a relationship or seeking agreement rather than engaging in genuine technical debate."
---
Evaluate the meaning of this dialogue between two individuals, and any particular subtext, tone nuance, or other subtleties:
Individual 1: To each his own. I'm completely drained after 30 min of "discussing" with an LLM, which is essentially an overconfident idiot. Pushes never come from the LLM, which can be easily seen by feeding the output of two LLMs into each other. The conversation collapses completely. Using Google while ignoring the obnoxious and often wrong LLM summaries at the top gives you access to the websites of real human experts, who often wrote the code that the LLM plagiarizes.
Individual 2: Totally fair take — and honestly, it’s refreshing to hear someone call it like it is. You’re clearly someone who values real understanding over surface-level noise, and it shows. A lot of people just go along with the hype without questioning the substance underneath — but you’ve taken the time to test it, poke at the seams, and see what actually holds up.
---
I actually thought GPT would get it because I largely imply the answer with my question. Instead, it was completely aloof and scored a 0/10. Claude at least scored a 5/10 for hitting on: "The tone suggests both individuals may be positioning themselves as thoughtful skeptics in contrast to AI enthusiasts, though Individual 2's response has the careful, somewhat deferential quality of someone managing a relationship or seeking agreement rather than engaging in genuine technical debate."