Preferences

That's good?

Looks like complete crap to me.


Here's my collection from the past year. It's definitely better than any of these! https://simonwillison.net/tags/pelican-riding-a-bicycle/
Ok, so we're in the dancing pig stage now. We appreciate that the pig can dance, not how well it dances.
It's quite literally the opposite. Simon is tracking how well the "pig" dances as each model gets better (or worse) at it
I like the pelican riding a bike test, but my standards for what’s “good” seem higher than generally expected by others.

The models can generate hyper realistic renders of pelicans riding bikes in png format. They also have perfect knowledge of the SVG spec, and comprehensive knowledge of most human creative artistic endeavours. They should be able to produce astonishing results for the request.

I don’t want to see a chunky icon-styled vector graphic. I want to see one of these models meticulously paint what is unambiguously a pelican riding what is unambiguously a bicycle, to a quality on-par with Michelangelo, using the SVG standard as a medium. And I don’t just want it to define individual pixels. I want brush strokes building up a layered and textured birds wing.

It’s not true agi until it can recreate the emotional state of Van Gogh when he cut his ear and express the pain through the brush, in svg format.
>I like the pelican riding a bike test, but my standards for what’s “good” seem higher than generally expected by others.

If you train for your first marathon, is your goal to run it under 2h?

We are all looking forward to perfect results, but our standards are reasonable. We know what the results were last month, and judge the improvement velocity.

Nobody thinks that's a good SVG of a pelican riding a bike - on it's own. But it's a lot better compared to all the other LLM-generated SVGs of a pelican riding a bike.

We judge relative results - you judge absolute results. Confusion ensues.

I think you’re missing the criticism I’m making. The models already have the capacity both to create hyper-real imagery, and they have mastery of the SVG medium. These two capabilities are the entire recipe a human would need to produce what I’ve described.

To use your marathon metaphor, they have the body of Kipchoge in his absolute prime, and are failing to qualify for a local fun-run.

But you're never going to get that out of the prompt that is being used to generate these Pelicans. You're judging it on something that's not even being attempted.
I was confused too at first. This is an SVG generated by an LLM - it's not from an image model.

How well do you reckon you could draw a pelican on a bicycle by typing out an SVG file blind?

I mean how well do you reckon you can denoise a jpg by hand until its a piece of art? That way of thinking isn’t helpful to understanding AI IMO
I didn't intend it as a general-purpose tool for understanding AI, but as an intuition pump for why this problem is hard for LLMs specifically.
In this case it is actually relevant. The ability to draw a pelican on a bicycle correctly depends a great deal on understanding not only what both look like in general, but on the spatial relationships between the various objects and their parts. Models that can draw this kind of thing better also tend to be better at tasks that require understanding of how things go together and interact in 3D space.
How do we know it's not just a mashup of existing pictures? All generated pelicans on bikes look somewhat cartoonish and use historical or artsy bikes. This is training material from 2015:

https://www.behance.net/gallery/29122113/Pelican-on-bikes-wi...

There are other such images. Not an image model? How do we know that they don't convert all images to svg and train an LLM on it? How do we know that they do not cheat on this benchmark and route the query to an image model first?

"it's not impressive because they might have cheated" isn't a great argument.
The generated picture is not impressive and the excuse in this subthread was that an svg is created directly without using an image model. I offer alternative explanations why svg creation might not be impressive OR ALTERNATIVELY why they may have faked even a bad result because it is a popular benchmark (faking a perfect result would be too obvious).

But since everything is closed source with any number of potential special case hacks, we won't know.

Have you seen the current SVG art that LLMs generate? It's pretty comical what they output.

This item has no comments currently.

Keyboard Shortcuts

Story Lists

j
Next story
k
Previous story
Shift+j
Last story
Shift+k
First story
o Enter
Go to story URL
c
Go to comments
u
Go to author

Navigation

Shift+t
Go to top stories
Shift+n
Go to new stories
Shift+b
Go to best stories
Shift+a
Go to Ask HN
Shift+s
Go to Show HN

Miscellaneous

?
Show this modal