Comment by epolanski - Hacker Neue

epolanski Sep 29, 2025 parent

There's few issues with posts like yours:

1. Different LLMs require different prompts and information

2. They ignore LLMs non determinism, you should run the experiment several times

iagooar Sep 29, 2025

Oh and I agree so much. I just shared a quick first observation in a real-world testing scenario (BTW re-ran Sonnet 4.5 with the same prompt, not much changed). I just keep seeing how LLM providers keep optimizing for benchmarks, but then I cannot reproduce their results in my projects.

I will keep trying, because Claude 4 generally is a very strong line of models. Anthropic has been on the AI coding throne for months before OpenAI with GPT-5 and Codex CLI (and now GPT-5-Codex) has dethroned them.

And sure I do want to keep them competing to make each other even better.

mexicocitinluez 5 days ago

What would be the difference in prompts/info for Claude vs ChatGpt? Is this just based on anecdotal stuff or is there actually something I can refer to when writing prompts? I mostly use Claude, but don't really pay much attention to the exact wording of the prompts

This item has no comments currently.