- All things considered Anthropic seems like they’re doing most things the right way, and seemed to be focused on professional use more than OpenAI and Grok, and Opus 4.5 is really an incredibly good model.
Yes, they know how to use their safety research as marketing, and yes, they got a big DoD contract, but I don’t think that fundamentally conflicts with their core mission.
And honestly, some of their research they publish is genuinely interesting.
- Yeah no, this is very much not true, even more so for a Go-based implementation and energy consumption optimized ARM devices.
- I had a “somebody is wrong on the internet!!” discussion about exactly this a few weeks ago, and they proclaimed to be a professor in AI.
Where do people get the idea from that temperature affects caching in any way? Temperature is about next token prediction / output, not input.
- So basically oauth-style app connections. Makes sense.
- You’re misunderstanding: you just convert to 32 bits once and reuse that same register all the time.
You’re running the exact same code, but are more more efficient in terms of “I immediately use the data for comparison after converting it”, which means it’s likely either in a register or L1 cache already.
- It can, because of how CPUs work with registers and hot code paths and all that.
First normalizing everything and then comparing normalized versions isn’t as fast.
And it also enables “stopping early” when a match has been found / not found, you may not actually have to convert everything.
- That’s not practical in many situations, as the normalization alone may very well be more expensive than the search.
If you’re in control of all data representations in your entire stack, then yes of course, but that’s hardly ever the case and different tradeoffs are made at different times (eg storage in UTF-8 because of efficiency, but in-memory representation in UTF-32 because of speed).
- That’s an unhelpful take, if you expect everyone to be fluent in the language of the country they’re traveling to.
Another note: I live in Cambodia, where many French people live, and nearly none of them speak the local language, and a very decent amount of them don’t even speak English. Worse yet, the older generation is still hung up in the idea that it’s better for the locals to learn French than English or Chinese.
This is really a very French thing, and you don’t see the same behavior in eg Germany or Italy.
(I’m originally from The Netherlands)
- It’s much easier to tax the general population than businesses, as they don’t push back as much.
It’s the same pattern everywhere around the world (perhaps there are a few exceptions). Businesses can be much more creative with tax evasion as well.
- Aren’t those “record it all” applications implemented as a RAG and injected into the context based on embedding similarity?
Obviously you’re not going to always inject everything into the context window.
- I think it may be about the absolute memory address to the secret being stored, which may itself be exploitable (ie you’re thinking about the offset value, rather than the pointer value). it’s about leaking even indirect information that could be exploited in different ways. From my understanding, this type of cryptography goes to extremely lengths to basically hide everything.
That’s my hunch at least, but I’m not a security expert.
The example could probably have been better phrased.
- How is asking for clarification before pushing back a bad thing?
- My point is that it’s better that the model asks questions to better understand what’s going on before pushing back.
- Well yes, but asking the model to ask questions to resolve ambiguities is critical if you want to have any success in eg a coding assistant.
There are shitloads of ambiguities. Most of the problems people have with LLMs is the implicit assumptions being made.
Phrased differently, telling the model to ask questions before responding to resolve ambiguities is an extremely easy way to get a lot more success.
- Because of privacy reasons? Yeah I’m not going to spend a small fortune for that to be able to use these types of models.
- I don’t understand what you’re saying. What’s preventing you from using eg OpenRouter to run a query against Kimi-K2 from whatever provider?
- I don’t understand the point you’re trying to make. LLMs are not humans.
From my perspective, the whole problem with LLMs (at least for writing code) is that it shouldn’t assume anything, follow the instructions faithfully, and ask the user for clarification if there is ambiguity in the request.
I find it extremely annoying when the model pushes back / disagrees, instead of asking for clarification. For this reason, I’m not a big fan of Sonnet 4.5.
- I don’t think it will ever make sense; you can buy so much cloud based usage for this type of price.
From my perspective, the biggest problem is that I am just not going to be using it 24/7. Which means I’m not getting nearly as much value out of it as the cloud based vendors do from their hardware.
Last but not least, if I want to run queries against open source models, I prefer to use a provider like Groq or Cerebras as it’s extremely convenient to have the query results nearly instantly.
- > As a chatbot, it's the only one that seems to really relish calling you out on mistakes or nonsense, and it doesn't hesitate to be blunt with you.
My experience is that Sonnet 4.5 does this a lot as well, but this is more often than not due to a lack of full context, eg accusing the user of not doing X or Y when it just wasn’t told that was already done, and proceeding to apologize.
How is Kimi K2 in this regard?
Isn’t “instruction following” the most important thing you’d want out of a model in general, and a model pushing back more likely than not being wrong?
It’s more like an assistant that advices you rather than a tool that you hand full control to.
Not saying that either is better, but they’re not the same thing.