Taking a Second Look - Hacker Neue

4 points Aug 14, 2025

jeffreysmith OP Aug 14, 2025

Howdy, HN. Authors here. We got tired of text-to-image leaderboards that only focus on aesthetics, so we built our own benchmarks to test what matters for real work: fidelity to complex prompts, safety, bias, and IP infringement.

We analyzed 18 models and found that no single model is good at everything. For example, GPT-4o has the best safety guardrails but also a 98% IP infringement rate on celebrity likenesses. Google's Imagen 4 Ultra actively counters bias (e.g., 90% of its "CEOs" are female) but struggles with generating crowds. X AI's Grok 2 blocks almost nothing.

Lots more detail in the post. We'll be here all day to answer questions.

ianchenh Aug 14, 2025

Really unique viewpoint. Can't stress how rare it is these days for tech startups and companies to emphasize social responsibility, and crucially its potential to translate to profitability as well! Responsible AI isn't just a constraint on the field - controllability means quality and usability.

This item has no comments currently.

Preferences

Keyboard Shortcuts

Story Lists

Navigation

Miscellaneous