Comment by kurtis_reed

kurtis_reed Sep 29, 2025 parent

Why did you have access to a preview?

simonw Sep 29, 2025

I get access to previews from OpenAI, Anthropic and Gemini pretty often. They're usually accompanied by an NDA and an embargo date - in this case the embargo was 10am Pacific this morning.

I won't accept preview access if it comes with any conditions at all about what I can say about the model once the embargo has lifted.

dzhiurgis Sep 30, 2025

Soooo that leaves xAI that had conditions

poopiokaka Sep 29, 2025 (dead)

Redster Sep 29, 2025

Simonw is a cheerful and straightforward AI journalist who likes to show and not just tell. He has done a good job aggregating and documenting the progress of LLM tools and models. As I understand it, OpenAI and Anthropic have both wisely decided to make sure he has up to date info because they know he'll write about it.

Thanks for all your work, Simon! You're my favorite journalist in this space and I really appreciate your tone.

tootie Sep 29, 2025

Simon has a popular blog, but he's also co-creator of Django and very well-known in the Python community.

michaelt Sep 29, 2025

> As I understand it, OpenAI and Anthropic have both wisely decided to make sure he has up to date info because they know he'll write about it.

And the wisest part is if he writes something they don't like, they can cut off that advanced access.

As is the longstanding tradition in games journalism, travel journalism, and suchlike.

simonw Sep 30, 2025

If they do that I'll go back to writing about them after they ship. Not a big loss for me at all.

tripzilch Sep 30, 2025

I get it, you would trust yourself if you said that, but it doesn't really matter whether you say that or not, what counts for your ongoing credibility if you will preface every future blog post with, whether you got special access, a special deal, sponsorship, or the fact that you didn't get any of those things.

You're a reviewer. This is how reviewers stay credible. If you don't disclose your relationship with the thing or company you're reviewing, I'm probably better off assuming you're paid.

And if your NDA says you can't write that in your preface, then logically, it is impossible to write a credible review in the first place.

simonw Sep 30, 2025

I recently started doing that: https://simonwillison.net/about/#disclosures and https://simonwillison.net/tags/disclosures/

tripzilch Sep 30, 2025

awesome, thanks a lot that's important but ... sorry I just checked those, and I do think it's better to do it on a per-article basis, because a lot of your audience (I'm guessing) comes from external links, not browsing your website

this is (or should be) a pretty standard thing to do on youtube review channels (that I would trust), and it's not a bad thing to remind people of, on every occasion, plus it can function as a type of "canary" in cases of particularly restrictive NDAs

knowsuchagency Sep 29, 2025

I like Simon, but he's not a journalist. A journalist would not have gone to OpenAI to glaze the GPT-5 release with Theo. I don't say this to discount Simon -- I appreciate his writing and analysis but a journalist, he isn't.

simonw Sep 29, 2025

I don't call myself a journalist, partly because no publication is paying me to do any of this!

If I had an editor I imagine they would have talked me out of going to the OpenAI office for a mysterious product preview session with a film crew.

Redster Sep 29, 2025

That's a fair point. I feel like he's more than a blogger and am not sure the best term!

LudwigNagasena Sep 29, 2025

An influencer.

kid64 Sep 30, 2025

Guys, he's standing right there

fourthark Sep 30, 2025

Argh

asadotzler Sep 29, 2025

AI blogger seems more appropriate than journalist.

nchmy Sep 29, 2025

are you aware of any "ai journalists"? Because simonw does great work, so perhaps blogger is what people should aspire towards?

simonw Sep 29, 2025

I actually talk to journalists on the AI beat quite often - I've had good conversations with them at publications including The Economist and NY Times and Washington Post and ArsTechnica.

They're not going to write up detailed reviews of things like the new Claude code interpreter mode though, because that's not of interest to a general enough audience.

I don't have that restriction: https://simonwillison.net/2025/Sep/9/claude-code-interpreter...

grim_io Sep 29, 2025

Not sure what an AI journalist is supposed to be or do, but a lack of one does not promote someone who is not it automatically into the position.

landl0rd Sep 29, 2025

Kylie Robison recently moved to Wired and is a solid "AI journalist".

minimaxir Sep 29, 2025

Although she is indeed solid as an AI journalist, unfortunately she was recently let go for unknown reasons: https://www.kyliebytes.com/thank-god-i-got-fired/

3 More Comments →

rapfaria Sep 29, 2025

His "pelican riding a bicycle" tests are now a classic and AI shops are benchmaxxing for it

simonw Sep 29, 2025

They need to benchmaxxx a whole lot harder, the illustrations still all universally suck!

lxgr Sep 29, 2025

I fully expect a model to output a SVG made up of 1000x1000 rectangles (i.e. pixels) representing a raster image of a beautifully hand-drawn pelican riding a bicycle any day now :)

simonw Sep 29, 2025

I got an amazing result from ChatGPT a while back - an SVG with a perfect illustration of a pelican riding a bicycle.

It was suspiciously good in fact... so I downloaded the SVG file and found out it had generated a raster image with its image tool and then embedded it as base64 binary image data inside an SVG wrapper!

dhhugley Sep 30, 2025

You’ll just have to move the goalpost then; perhaps it can be a multidimensional pelican saving the multiverse, or an invisible pelican that only you can see and critique.

lxgr Sep 30, 2025

How would that help, given that ChatGPT has apparently already figured out how to consistently and systematically game the benchmark by working in pixel space and only using SVG as a wrapper for a raster image?

FWIW, I could totally see a not hugely more advanced model using its native image generation capabilities and then running a vector extraction tool on it, maybe iteratively. (And maybe I would not consider that cheating, anymore, since at some point that probably resembles what humans do?)

sixeyes Sep 30, 2025

ive got such pixelated rectangle SVG's a few times.

also with cursor, "write me a script that outputs X as an svg" it has given me rectangles a few times.

astrange Sep 29, 2025

If they were testing that it'd work more often.

Other things you can ask that they're still clearly not optimizing for are ASCII art and directions between different locations. Complete fabrications 100% of the time.

Sharlin Sep 29, 2025

Well, I definitely hope they aren't trying to teach LLMs directions between locations, given how idiotic use of compute and parameter space that would be. We already have excellent AIs for route planning. What they ought to optimize for is, of course, finally teaching them to say they don't know, or just automatically opting to call a route-planning API if the user asks for directions.

minimaxir Sep 29, 2025

Simon tends to write up reports of new LLM releases (with great community respect) and it's much easier with lead time if the provider is able to set up a preview endpoint.

criddell Sep 30, 2025

I believe the criticism is that he's reporting on a pre-release LLM which isn't the same as the one you and I are going to be using a few weeks from now after they've downgraded it enough to work at scale.

lossolo Sep 29, 2025

The same reason YouTube reviewers and influencers get access to hardware or games before release. In this case, the person is a passionate blogger.

runjake Sep 30, 2025

simonw is Simon Willison, who’s well known for a number of things. But these days, he’s well known for his AI centric blog and his tools. The AI companies give him early access to stuff.

https://simonwillison.net/

kissgyorgy Sep 30, 2025

If you want to keep up with AI progress and model updates, simonw is the man to follow!

lomase Sep 29, 2025

They are an AI evangelist that told me I can replace any technical book created with an LLM.

They are a nice person.

rhizome Sep 29, 2025

You are correct, sir!

mvdtnz Sep 29, 2025 (dead)

This item has no comments currently.