Comment by skeeter2020

skeeter2020 Nov 20, 2025 parent

>> - Put a strawberry in the left eye socket. >>- Put a blackberry in the right eye socket.

>> All five of the edits are implemented correctly

This is a GREAT example of the (not so) subtle mistakes AI will make in image generation, or code creation, or your future knee surgery. The model placed the specified items in the eye sockets based on the viewers left/right; when we talk relative in this scenario we usually (always?) mean from the perspective of the target or "owner". Doctors make this mistake too (they typically mark the correct side with a sharpie while the patient is still alert) but I'd be more concerned if we're "outsourcing" decision making without adequate oversight.

https://minimaxir.com/2025/11/nano-banana-prompts/#hello-nan...

oasisbob Nov 20, 2025

There's a classic well-illustrated book, _How to Keep Your Volkswagen Alive_, which spends a whole illustrated page at the beginning building up a reference frame for working on the vehicle. Up is sky, down is ground, front is always vehicle's front, left is always vehicle's left.

Sounds a bit silly to write it out, but the diagram did a great job removing ambiguity when you expect someone to be laying on the ground in a tight place looking backwards, upside down.

Also feels important to note that in the theatre, there is stage-right and stage-left, jargon to disambiguate even though the jargon expects you to know the meaning to understand it.

bo1024 Nov 21, 2025

Port and starboard

I guess car people use “driver side” and passenger side”, but the same car might be sold in mirror image versions

CGMthrowaway Nov 20, 2025

>This is a GREAT example of the (not so) subtle mistakes AI will make in image generation, or code creation, or your future knee surgery.

The mistake is in the prompting (not enough information). The AI did the best it could

"What's the biggest known planet" "Jupiter" "NO I MEANT IN THE UNIVERSE!"

sebzim4500 Nov 20, 2025

It doesn't affect your point but technically since the IAU are insane, exoplanets aren't technically planets and Jupiter is the largest planet in the universe.

MangoToupe Nov 20, 2025

I suppose it was too much to hope that chatbots could be trained to avoid pointless pedantry.

fragmede Nov 20, 2025

They've been trained on every web forum on the Internet. How could it be possible for them to avoid that?

throawayonthe Nov 20, 2025

asking "x-most known y" and not expecting a global answer is odd

kridsdale3 Nov 20, 2025

Every answer concerning planets is global.

retsibsi Nov 21, 2025

Maybe! https://en.wikipedia.org/wiki/Toroidal_planet

bigstrat2003 Nov 20, 2025

No, this is squarely on the AI. A human would know what you mean without specific instructions.

siffin Nov 20, 2025

Seems like you're making a judgment based on your own experience, but as another commenter pointed out, it was wrong. There are plenty of us out there who would confirm, because people are too flawed to trust. Humans double/triple check, especially under higher stakes conditions (surgery).

Heck, humans are so flawed, they'll put the things in the wrong eye socket even knowing full well exactly where they should go - something a computer literally couldn't do.

rullelito Nov 20, 2025

Why on earth would the fallback when a prompt is under specified be to do something no human expects?

emp17344 Nov 21, 2025

“People are too flawed to trust”? You’ve lost the plot. People are trusted to perform complex tasks every single minute of every single day, and they overwhelmingly perform those tasks with minimal errors.

siffin Nov 22, 2025

Extremely talented, studied, hard working humans perform complex tasks all the time, and never with 100% win rate over all time.

In other examples, almost every single person has had the experience of saying, "turn right", "oh I meant left sorry, I knew it was right too, I don't know why I said left". Even the most sophisticated humans have made this error. A computer would never.

Humans are deeply flawed and after pre-selection require expensive training to perform complex tasks at a never perfect success rate.

rodrigodlu Nov 20, 2025

Intelligence in my book includes error correction. Questioning possible mistakes is part of wisdom.

So the understanding that AI and HI are different entities altogether with only a subset of communication protocols between them will become more and more obvious, like some comments here are already implicitly telling.

danso Nov 20, 2025

If the instructions were actually specific, e.g. Put a blackberry in its right eye socket, then yes, most humans would know what that meant. But the instructions were not that specific: in the right eye socket

TylerE Nov 20, 2025

Or be even more explicit: Put a strawberry in the person’s right eye socket.

adastra22 Nov 20, 2025

If you asked me right now what the biggest known planet was, I'd think Jupiter. I'd assume you were talking about our solar system ("known" here implying there might be more planets out in the distant reaches).

CGMthrowaway Nov 20, 2025

I would be amused to see you test this theory with 100 men on the street

jaggederest Nov 20, 2025

I would not, I would clarify, and I think I'm a human.

nkmnz Nov 20, 2025

Yeah, just like humans always know what you mean.

recursive Nov 20, 2025

But different humans would know what you meant differently. Some would have known it the same way the AI did.

0x457 Nov 20, 2025

Right, that's why one should use "put a strawberry in the portside eye socket" and "put a strawberry in the starboard side socket"

iammattmurphy Nov 20, 2025

When it doubt, always use nautical terminology

crazygringo Nov 21, 2025

> when we talk relative in this scenario we usually (always?) mean from the perspective of the target or "owner".

I dunno... I feel pretty confident 99% percent of people would do the same thing, and put the strawberry in the eye socket to our left, the viewer's.

You really have to be trained explicitly to put yourself in the subject's shoes, and very few people are. To me, the model is correctly following the instructions most people will mean.

And it's not even incorrect. "The left x" is linguistically ambiguous. If you say "the left flower", it's obviously the flower to our left. So when you say "the left eye socket", the eye socket to our left is a valid interpretation. If they had said their or its left eye socket, then it's more arguable that it must be from the subject's side. But that's not the case in this example.

threetonesun Nov 21, 2025

There's a puzzle in the latest Indiana Jones game that exploits the fact that yes, most people would do the same thing.

Jabrov Nov 20, 2025

I don't know if that's so much a mistake as it is ambiguity though? To me, using the viewer's perspective in this case seems totally reasonable.

Does it still use the viewer's perspective if the prompt specifies "Put a strawberry in the _patient's left eye_"? If it does, then you're onto something. Otherwise I completely disagree with this.

ComputerGuru Nov 20, 2025

“Eye on the left” is different from “the left eye”. First can be ambiguous, second really isn’t.

simonw Nov 20, 2025

I think "the left eye" in this particular case (a photo of a skull made of pancake batter) is still very slightly ambiguous. "The skull's left eye" would not be.

Dylan16807 Nov 21, 2025

Interesting, because I would say the opposite. "On the left" suggests left of image, "the left eye" could be any version of left.

recursive Nov 20, 2025

I guess there's some ambiguity regarding whether or not this can be ambiguous. Because it seems like it can to me.

withinboredom Nov 20, 2025

“The right socket” can only be implied one way when talking about a body just like you only have one right hand despite the fact that it is on my left when looking at you.

marcellus23 Nov 20, 2025

I think the fact that anyone in this thread thinks it's ambiguous is proof by definition that it's ambiguous.

pphysch Nov 20, 2025

"Plug into right power socket"

Same language, opposite meaning because of a particular noun + context.

I think the only thing obvious here is that there is no obvious solution other than adding lots of clarification to your prompt.

withinboredom Nov 20, 2025

I think you missed the entire point?

swores Nov 20, 2025

No, they just disagree with you.

6 More Comments →

esrauch Nov 21, 2025

"Right hand" is practically a bigram that has more meaning, since handedness is such a common topic.

Also context matters, if you're talking to someone you would say "right shoulder" for _their_ right since you know it's an observer with different vantage point. Talking about a scene in a photo "the right shoulder" to me would more often mean right portion of the photo even if it was the person's left shoulder.

Dylan16807 Nov 21, 2025

Having one person in the frame isn't enough to unambiguously put us into the "talking about a body" context.

lifthrasiir Nov 20, 2025

That was a big problem when I was toying around the original Nano Banana. I always prompted the perspective of the (imaginary) camera, and yet NB often interpreted that as that of the target, giving no way to select the opposite side. Since the selected side is generally closer to the camera, my usual workaround is to force the side far from the camera. And yet that was not perfect.

minimaxir Nov 20, 2025

I meant to add a clarification to that point (because the ambiguity is a valid counterpoint), thanks for the reminder.

This item has no comments currently.

Preferences

Keyboard Shortcuts

Story Lists

Navigation

Miscellaneous