Comment by jacquesm - Hacker Neue

jacquesm Oct 16, 2025 parent

That's good?

Looks like complete crap to me.

simonw Oct 16, 2025

Here's my collection from the past year. It's definitely better than any of these! https://simonwillison.net/tags/pelican-riding-a-bicycle/

jacquesm OP Oct 17, 2025

Ok, so we're in the dancing pig stage now. We appreciate that the pig can dance, not how well it dances.

afro88 Oct 17, 2025

It's quite literally the opposite. Simon is tracking how well the "pig" dances as each model gets better (or worse) at it

OtherShrezzing Oct 16, 2025

I like the pelican riding a bike test, but my standards for what’s “good” seem higher than generally expected by others.

The models can generate hyper realistic renders of pelicans riding bikes in png format. They also have perfect knowledge of the SVG spec, and comprehensive knowledge of most human creative artistic endeavours. They should be able to produce astonishing results for the request.

I don’t want to see a chunky icon-styled vector graphic. I want to see one of these models meticulously paint what is unambiguously a pelican riding what is unambiguously a bicycle, to a quality on-par with Michelangelo, using the SVG standard as a medium. And I don’t just want it to define individual pixels. I want brush strokes building up a layered and textured birds wing.

scrollaway Oct 16, 2025

It’s not true agi until it can recreate the emotional state of Van Gogh when he cut his ear and express the pain through the brush, in svg format.

paintbox Oct 17, 2025

>I like the pelican riding a bike test, but my standards for what’s “good” seem higher than generally expected by others.

If you train for your first marathon, is your goal to run it under 2h?

We are all looking forward to perfect results, but our standards are reasonable. We know what the results were last month, and judge the improvement velocity.

Nobody thinks that's a good SVG of a pelican riding a bike - on it's own. But it's a lot better compared to all the other LLM-generated SVGs of a pelican riding a bike.

We judge relative results - you judge absolute results. Confusion ensues.

OtherShrezzing Oct 17, 2025

I think you’re missing the criticism I’m making. The models already have the capacity both to create hyper-real imagery, and they have mastery of the SVG medium. These two capabilities are the entire recipe a human would need to produce what I’ve described.

To use your marathon metaphor, they have the body of Kipchoge in his absolute prime, and are failing to qualify for a local fun-run.

fkyoureadthedoc Oct 17, 2025

But you're never going to get that out of the prompt that is being used to generate these Pelicans. You're judging it on something that's not even being attempted.

jstanley Oct 16, 2025

I was confused too at first. This is an SVG generated by an LLM - it's not from an image model.

How well do you reckon you could draw a pelican on a bicycle by typing out an SVG file blind?

aabhay Oct 17, 2025

I mean how well do you reckon you can denoise a jpg by hand until its a piece of art? That way of thinking isn’t helpful to understanding AI IMO

jstanley Oct 17, 2025

I didn't intend it as a general-purpose tool for understanding AI, but as an intuition pump for why this problem is hard for LLMs specifically.

int_19h Oct 17, 2025

In this case it is actually relevant. The ability to draw a pelican on a bicycle correctly depends a great deal on understanding not only what both look like in general, but on the spatial relationships between the various objects and their parts. Models that can draw this kind of thing better also tend to be better at tasks that require understanding of how things go together and interact in 3D space.

bgwalter Oct 17, 2025

How do we know it's not just a mashup of existing pictures? All generated pelicans on bikes look somewhat cartoonish and use historical or artsy bikes. This is training material from 2015:

https://www.behance.net/gallery/29122113/Pelican-on-bikes-wi...

There are other such images. Not an image model? How do we know that they don't convert all images to svg and train an LLM on it? How do we know that they do not cheat on this benchmark and route the query to an image model first?

jstanley Oct 17, 2025

"it's not impressive because they might have cheated" isn't a great argument.

bgwalter Oct 17, 2025

The generated picture is not impressive and the excuse in this subthread was that an svg is created directly without using an image model. I offer alternative explanations why svg creation might not be impressive OR ALTERNATIVELY why they may have faked even a bad result because it is a popular benchmark (faking a perfect result would be too obvious).

But since everything is closed source with any number of potential special case hacks, we won't know.

recallingmemory Oct 16, 2025

Have you seen the current SVG art that LLMs generate? It's pretty comical what they output.

This item has no comments currently.

Preferences

Keyboard Shortcuts

Story Lists

Navigation

Miscellaneous