The output consistency is interesting. I just went through half a dozen generations of my standard image model challenge, (to date I have yet to see a model that can render piano keyboard octaves correctly, and Gemini 2.5 Flash Image is no different in that regard), and as best I can tell, there are no changes at all between successive attempts: https://g.co/gemini/share/a0e1e264b5e9
This is in stark contrast to ChatGPT, where an edit prompt typically yields both requested and unrequested changes to the image; here it seems to be neither.
Flash 2.0 Image had the same issue: it does better than gpt-image for maintaining consistency in edits, but that also introduces a gap where sometimes it gets "locked in" on a particular reference image and will struggle to make changes to it.
In some cases you'll pass in multiple images + a prompt and get back something that's almost visually indistinguishable from just one of the images and nothing from the prompt.
Wildly different and subjectively less "presentable", to be clear. The fashion bubble just generates a vague bubble shape with the subject inside it instead of the"subject flying through the sky inside a bubble" presented on the site. The other case just adds the fork to the bowl of spaghetti. Both are reproducible.
Arguably they follow the prompt better than what Google is showing off, but at the same time look less impressive.
It does look like I'm using the new model, though. I'm getting image editing results that are well beyond what the old stuff was capable of.