Preferences

Why do Hunyuan, OpenAI 4o and Gwen get a pass for the octopus test? They don't cover "each tentacle", just some. And midjourney covers 9 of 8 arms with sock puppets.

vunderba
Good point. I probably need to adjust the success pass ratios to be a bit stricter, especially as the models get better.

> midjourney covers 9 of 8 arms with sock puppets.

Midjourney is shown as a fail so I'm not sure what your point is. And those don't even look remotely close to sock puppets, they resemble stockings at best.

This item has no comments currently.