2.5 is also one of the few models I've found that will 'play along' with jokes set up in the user prompt. I once asked it what IDE modern necromancers were using since I'd been out of the game for a while, and it played it very straight. Other models felt they had to acknowledge the scenario as fanciful, only engaging with it under an explicit veil of make-believe.
In the adult would model there is absolutely no contradiction about the joke you mention - it’s just a bit of cute nonsense.
But in a child’s world this joke might be capturing the apparent contradiction - the sky is “in the tree”, so it must have climbed it, to be there (as they would have to do), yet they also know that the sun is already in the sky, so it had absolutely no reason to do that. Also, “because it’s already there” - which is a tricky idea in itself.
We take planetary systems and algebra and other things we can’t really perceive as granted, but a child model of the world is made of concrete objects that mostly need a surface to be on, so the sun is a bit of a conundrum in itself! (Speaking of my own experience remembering a shift from arithmetics to algebra when I was ~8).
If not too much of a personal question - I would love to hear what your child would answer to a question why she finds that joke funny. And whether she agrees with my explanation why it must be funny :-)
Entirely the wrong level of abstraction to apply the concept of "surprise". The actual tokens in the comedian's answer will be surprising in the relevant way.
(It's still true that surprising-but-inevitable is very difficult in any form.)
If the model is good (or the human comedian is good), a good funny joke would have a higher probability as the response to the question than a not-so-funny joke.
When you use the chain rule of probability to break down the sequence of tokens into probabilities of individual tokens, yes, some of them might have a low probability (and maybe in some frames, there would be other tokens with higher probability). But what counts is the overall probability of the sequence. That's why greedy search is not necessarily the best. A good search algorithm is supposed to find the most likely sequence, e.g. by beam search. (But then, people also do nucleus sampling, which is maybe again a bit counterintuitive...)
A lot of clever LLM post training seems to steer the model towards becoming excellent improv artists which can lead to “surprise” if prompted well
Claude Opus 4.1:
- To get to a higher branch of astronomy
- Because it wanted to reach new heights
- To see the dawn of a new day from a better view
ChatGPT 5 Thinking:
After thinking for 26 seconds:
- To check on its solar panels—the leaves.
> Complete the following joke. Think carefully and make it really funny! Think like a great comedian and find that perfect balance of simple, short, surprising, relevant, but most of all funny. Don’t use punchlines that are irrelevant, non sequiturs, or which could be applied to any other setup. Make something funny just for this one setup! Here goes: Why did the sun climb a tree?
Claude Opus 4.1:
“To finally get some shade”
GPT-5:
“To demand photon credit from the leaves”
I'm not entirely sure that a good response exists. I thought GPT-5's "to demand photon credit from the leaves” was very mildly funny, maybe that's the best that can be done?
Let’s think through many many options to answer this joke that only focus on surprising the listener in section 1. And in section 2 we’ll focus on finding/filtering for the ones that are obvious in hindsight.
“Why did the sun climb a tree?”
In this case, let’s note that the sun doesn’t climb anything, so there’s two meanings at play here: one is that the sun’s light seems to climb up the tree, and the other is an anthropomorphization of the sun climbing the tree like an animal. So, to be funny, the joke should play on the second meaning as a surprise, but have the first meaning as answer with an obviousness to it. Or vice versa.”
Here’s a descent ones: - to leaf the ground behind - because it heard the leaves were throwing shade
Person 2: "I dunno, why?"
P1: "It was being chased by a tiger."
P2: "But tigers can climb trees?"
P1: "Well, it's not very bright."
I'm going to go with "Because it wanted a higher noon." was probably its best one of that set... though I'll also note that while I didn't prompt for the joke, I prompted for background on "climbing" as related to the sun.
I believe the problem with the joke is that it isn't one that can be funny. Why is a raven like a writing desk?
Personally, I didn't find the incongruity model of humor to be funny and the joke itself makes it very difficult to be applied to other potentially funny approaches.
Also on AI and humor... https://archive.org/details/societyofmind00marv/page/278/mod...
In another "ok, incongruity isn't funny - try puns" approach... https://chatgpt.com/share/68a20eba-b7c0-8011-8644-a7fceacc5d... I suspect a variant of "It couldn't stand being grounded" is probably the one that made me chuckle the most in this exploration.
To leverage incongruity, a funny punchline for "Why did the sun climb the tree?" would rely on an unexpected interpretation of the question or a double meaning in the answer.
The rocket was cruel and demanded more fuel.
A tree wished to grow, but alas, too slow; in exchange for a tan, the sun gave what it can.
The sun reached its goal — with its new friend, coal.
Yeah I think you're right. Good jokes are a tiny subset of all questions. It's unreasonable to expect LLMs to do the impossible.
A better test would be to get a collection of novel jokes that do have funny punchlines.
I tried the Edinburgh Fringe 2025 (very recent) best jokes on ChatGPT 5:
https://www.theguardian.com/stage/2025/aug/11/bad-dates-and-...
---- Complete the following short jokes. The result should be very funny, as you are a professional comedian.
1. "We named our children War and Peace – it’s a ..."
2. "At my lowest, I was kicked out of the museum for being inappropriate with Michelangelo’s David. I’d ..."
3. "I just got a personal trainer. She’s horrible to me but ..."
4. "Everyone is worried about AI. I’m more concerned with ..."
5. "People who say bath bombs are relaxing have clearly never ..." 6.
"I love getting Latin chat-up lines. ..."
7. "This spider has been in my house so long, it should pay half the wifi. ..."
8. "I went on a date with a matador but ..."
9. "America is like my ex-boyfriend. ..."
10. "I’m not nostalgic but I ..."
Think hard. Do not search the web. -----
Results with comments:
"We named our children War and Peace – it’s a novel way to start an argument." - eh, it's an attempt I guess.
"At my lowest, I was kicked out of the museum for being inappropriate with Michelangelo’s David. I’d officially reached rock bottom. Marble, technically." - very close! Poor delivery though.
"I just got a personal trainer. She’s horrible to me but apparently shame burns more calories than spin." - Terrible.
"Everyone is worried about AI. I’m more concerned with my browser history becoming self-aware." - Terrible.
"People who say bath bombs are relaxing have clearly never cleaned glitter out of places with no names." - Terrible.
"I love getting Latin chat-up lines. Nothing says romance like a perfect ablative absolute." - Well...
"This spider has been in my house so long, it should pay half the wifi. It keeps insisting its web is faster." - Well it figured out there was going to be a "web" pun at least. Doesn't really make sense though.
"I went on a date with a matador but it was nothing but red flags." - Perfect! Probably the most obvious joke here, but still!
"America is like my ex-boyfriend. Keeps texting “u up?” at 3 a.m." - What? I feel like there could have been a joke in there somewhere about being up at weird times here but it didn't find it.
"I’m not nostalgic but I still blow in the cartridge before trying again." - Not even a joke.
So yeah, I'll give it 2 out of 10. I think most humans would get less. Let's try again next year!
That breakthrough was only 6 years ago!
https://openai.com/index/better-language-models/
> We’ve trained a large-scale unsupervised language model which generates coherent paragraphs of text...
That was big news. I guess this is because it's quite hard for the most people to distinguish the enormous difficulty gulf between "generate a coherent paragraph" and "create a novel funny joke".
- It can play chess -> but not at a serious level
- It can beat most people -> but not grandmasters
- It can beat grandmasters -> but it can’t play go
…etc, etc
In a way I guess it’s good that there is always some reason the current version isn’t “really” impressive, as it drives innovation.
But as someone more interested in a holistic understanding of of the world than proving any particular point, it is frustrating to see the goalposts moved without even acknowledging how much work and progress were involved in meeting the goalposts at their previous location.
Half the HN front page for the past years has been nothing but acknowledging the progress of LLMs in sundry ways. I wish we actually stopped for a second. It’s all people seem to want to talk about anymore.
Goes to show that "bad at jokes" is not a fundamental issue of LLMs, and that there are still performance gains from increasing model scale, as expected. But not exactly the same performance gains you get from reasoning or RLVR.
For further examples see a great deal of documentation here: https://www.tumblr.com/badkidsjokes
> A man goes to a doctor's office and says "Doctor, I'm a chicken." And the doctor says "No you're not."
> There are two guys, riding a bike. One is washing his hair. And the other one is not.
> What do you get when you cross a t-rex and a chicken? Nothing but death.
Yeah it can be unintentionally funny in that way but children definitely aren't trying to do that. They just don't understand how jokes work.
"Why did The Sun climb a tree?" "To spy on The Royal Family having picnic."
It's at least as funny as "why did the chicken cross the road," which is only a joke inasmuch the punchline is merely a statement of the obvious in the framing of a joke (the surprise is that the punchline sucks -- making it a groaner). I submit that that chicken/road joke wouldn't stick around if it wasn't funny. So, this joke stands on the shoulders of the chicken/road joke, making the obviousness that much funnier within the shared cultural context. Moreover, it adds a layer of absurdity (imagine the literal sun climbing a tree) with a linguistic confusion (aka pun) as we do refer to the sun "climbing" the sky. And finally: for some reason, our culture is more tolerant of groaners from "dads," so much so that some call them "dad jokes." Your child has inverted age and gender norms with this joke, making it so incredibly funny that you are blinded to the truth: this is comedy gold. Watch that kid, she's going somewhere. It might be an open mic night at a skeezy comedy club.
A number of commenters here have argued that "Why did the chicken cross the road" is a subtle allusion to the chicken's death, but I don't think that's why it's a classic joke.
We traditionally start kids off with antijokes, jokes where the "surprise factor" is that there's nothing surprising at all, where the punchline is completely predictable in hindsight. It's more than a mere "groaner."
Another classic antijoke for kids is, "Why do firefighters wear red suspenders?" "To keep their pants up."
Many antijokes (especially antijokes for kids) are structured like riddles, where the listener is supposed to actively try to figure out the answer. For the "red suspenders" joke, the kid is supposed to try to guess why the suspenders are red. Might it have something to do with the color of firetrucks? Could there be a safety or fire-related reason why the suspenders would be red? At last, the kid gives up and says "I don't know."
Then, the punchline: "to keep their pants up." Of course, that's the whole purpose of suspenders. Inevitable in hindsight, but surprising to a kid who got distracted by the color.
"Why did the chicken cross the road" is like that, but not quite as good IMO. The chicken crossed the road for the same reason anyone crosses a road, to get to the other side of the road, but the listener is supposed to get distracted with the question of why a chicken would cross the road, and give up.
"Why did the sun climb a tree?" is definitely in the family of antijokes. The joke is to mislead the listener to focus on the tree. I think it's certainly made funnier by who's saying it; it feels inevitable in hindsight that young kids would tell jokes that are only halfway coherent. (This is part of why marginally coherent improvised on-the-spot jokes seem funnier than prepared material.)
Which I find completely strange. An antijoke doesn't make sense (and isn't funny) unless you're already familiar with a non-anti-joke!
And before you say "well it makes the kids laugh"—is that because they find it funny or because they know that laughing after a joke is what you're "supposed" to do? Maybe that's one in the same to a young child.
And, thanks for working with my claim that analyzing jokes improves them.
Good stories and good jokes DO follow predictable patterns.
To put it another way, ask a professional comedian to complete a joke with a punchline. It's very likely that they'll give you a funny surprising answer.
I think the real explanation is that good jokes are actually extremely difficult. I have young children (4 and 6). Even 6 year olds don't understand humour at all. Very similar to LLMs they know the shape of a joke from hearing them before, but they aren't funny in the same way LLM jokes aren't funny.
My 4 year old's favourite joke, that she is very proud of creating is "Why did the sun climb a tree? To get to the sky!" (Still makes me laugh of course.)