Preferences

Nice work on these benchmarks Simon. I’ve followed your blog closely since your great talk at the AI Engineers World Fair, and I want to say thank you for all the high quality content you share for free. It’s become my primary source for keeping up to date.

I’ve been working on a few benchmarks to test how well LLMs can recreate interfaces from screenshots. (https://github.com/alechewitt/llm-ui-challenge). From my basic tests, it seems GPT-5.2 is slightly better at these UI recreations. For example, in the MS Word replica, it implemented the undo/redo buttons as well as the bold/italic formatting that GPT-5.1 handled, and it generally seemed a bit closer to the original screenshot (https://alechewitt.github.io/llm-ui-challenge/outputs/micros...).

In the VS Code test, it also added the tabs that weren’t visible in the screenshot! (https://alechewitt.github.io/llm-ui-challenge/outputs/vs_cod...).


That is a very good benchmark. Interesting to see GPT-5.2 delivering on the promise of better vision support there.

This item has no comments currently.

Keyboard Shortcuts

Story Lists

j
Next story
k
Previous story
Shift+j
Last story
Shift+k
First story
o Enter
Go to story URL
c
Go to comments
u
Go to author

Navigation

Shift+t
Go to top stories
Shift+n
Go to new stories
Shift+b
Go to best stories
Shift+a
Go to Ask HN
Shift+s
Go to Show HN

Miscellaneous

?
Show this modal