- encroachRight, it only scores 3 points higher on image edit, which is within the margin of error. But on image generation, it scores a significant 29 points higher.
- This is true, however LMArena does employ some methods to mitigate attempts to manipulate the leaderboard, see https://openreview.net/forum?id=zf9zwCRKyP
They also control for style https://news.lmarena.ai/sentiment-control/
- That's not how the arena works. The evaluation is blind so Google's advertising/integration has no effect on the results.
- How did you get early access?
- OAI's latest image model outperforms Google's in LMArena in both image generation and image editing. So even though some people may prefer nano banana pro in their own anecdotal tests, the average person prefers GPT image 1.5 in blind evaluations.
- There are no watermarks in the arena.
- This outperforms Gemini 3 pro image (nano banana pro) on Text-to-Image Arena and Image Edit Arena. I'm surprised they didn't mention this leaderboard in the blog post.
I like this benchmark because its based upon user votes, so overfitting is not as easy (after all, if users prefer your result, you've won).
- If you prefer a simpler style, then why did you write "the deeper I got into the world of literature" instead of "as I studied literature more"?
Why did you say you were "pushed towards" simpler language instead of "I liked it more"?
Why did you say "I feel the pain in my bones" and "drives me insane" instead of "I dislike it"?
Why did you say "the big boy SAT words should pop out of the page unaccompanied" instead of "there should only be one big word per page"?
Perhaps flowery language expands your ability to express yourself?
- Why put a time limit on exams? Why not put everyone on the same playing field by allowing unlimited time to take the exam? The majority of exams at my university have no time limit (within the operating hours of the testing center), and it works well. At the end of the day, if you don't know the material, having more time isn't going to help you.
- 1 point
- Why don't highschools teach every student these two things?
1. The miracle of markets (supply and demand, "the invisible hand," etc.)
2. The weakness of markets (incomplete information, monopoly, etc.)
- You are correct that, from a security standpoint, your software is no different than any other software I install on my computer, since desktop computers have no sandboxing. But from a privacy standpoint, it could be uniquely concerning.
With Google Drive, I choose which files to upload. It doesn't have broad access to everything on my computer.
Dropbox, iCloud, and OneDrive are just backup services, so in theory they could just back up your files as an encrypted blob and have no way to read them. Unfortunately, they don't encrypt them (which is partly why I don't use those services). But at least I have their "promise" that they won't read or analyze my files, which would make me feel better even if its a weak promise.
On the other hand, your service, by nature, is reading an analyzing all of my files using a remote server.
- Thanks for the context - it changes the light of the parent article.
- Is CS your passion? Stick with it. The job market isn't as good as it used to be, but it isn't as bad as people make it out to be. I am also pursuing a CS degree and I asked the same question here 6 months ago. Since then, this is what I've learned:
* Its likely that the slowing of the tech job market wasn't caused by AI, but by a change in the tax code (Section 174) and higher interest rates (companies over-hired during the pandemic when funding was abundant).
* LLMs may or may not increase developer productivity [1], and they definitely cannot replace software engineers entirely (and I don't think they ever will - but it depends who you ask)
* Anecdotally, finding a summer internship wasn't easy for me, but it also wasn't any harder than it was for my peers in other programs (engineering, finance, etc.). Job hunting is a skill that I think many people in CS don't have because it used to be easy.
* I used an agentic IDE extensively to code for my on-campus research job. I still enjoyed the job a lot, and even as an rookie developer, I still felt I played a very valuable role in my job that LLMs could not replace.
[1] https://metr.org/blog/2025-07-10-early-2025-ai-experienced-o...
- > The Casiotron (QW02-10 line) was priced at 58,000 yen, nearly a whole month’s starting salary for a university graduate in Japan back in 1974. The digital watch as a luxury item — another Casio innovation.
It is interesting to see something like this presented in a positive light.
- 5 points
- > they had archive.is blacklisted
What do you mean by this? Wikipedia actively encourages people to use archive.is links in citations:
https://en.wikipedia.org/wiki/Help:Archiving_a_source#Archiv...
- Here's 46% for 2030. It's had $350k in volume across the 4 markets.
https://kalshi.com/markets/kxoaiagi/openai-achieves-agi/oaia...