mips_avatar
Joined 1,985 karma
Me@jonready.com
Currently working on wanderfugl.com
- mips_avatarAs someone who doesn't spend a huge amount of time thinking about art, this piece soothed me more than I could have expected. Thank You
- yeah but not super fast like flash or grok fast
- I’m sure they do something like that. I’ve noticed azure has way faster gpt 4.1 than OpenAI
- I'm pretty sure xAI exclusively uses Nvidia H100s for Grok inference but I could be wrong. I agree that I don't see why TPUs would necessarily explain latency.
- I just want a deepseek moment for an open weights model fast enough to use in my app, I hate paying the big guys.
- I don't think the models are doing this, time to first token is more of a hardware thing. But people writing agents are definitely doing this, particularly in voice it's worth it to use a smaller local llm to handle the acknowledgment before handing it off.
- I mean they’re trying to outdo google. So they need to do that.
- I cannot comprehend how they do not care about this segment of the market.
- I'll try benchmarking mistral against my eval, I've been impressed by kimi's importance but it's too slow to do anything useful realtime.
- It's weird they don't document this stuff. Like understanding things like tool call latency and time to first token is extremely important in application development.
- OpenAI made a huge mistake neglecting fast inferencing models. Their strategy was gpt 5 for everything, which hasn't worked out at all. I'm really not sure what model OpenAI wants me to use for my applications that require lower latency. If I follow their advice in their API docs about which models I should use for faster responses I get told either use GPT 5 low thinking, or replace gpt 5 with gpt 4.1, or switch to the mini model. Now as a developer I'm doing evals on all three of these combinations. I'm running my evals on gemini 3 flash right now, and it's outperforming gpt5 thinking without thinking. OpenAI should stop trying to come up with ads and make models that are useful.
- I mean I do too, had a really odd Gemini bug until I did byok on openrouter
- I'd be curious how many people use openrouter byok just to avoid figuring out the cloud consoles for gcp/azure.
- For my apps evals Gemini flash and grok 4 fast are the only ones worth using. I'd love for an open weights model to compete in this arena but I haven't found one.
- I'm more curious how Gemini 3 flash lite performs/is priced when it comes out. Because it may be that for most non coding tasks the distinction isn't between pro and flash but between flash and flash lite.
- It's important in a democracy for people to pay taxes, it includes them in the system in a way that doesn't exist in a place like Saudi Arabia. We shouldn't aim for a bread and circuses society, but for one where we aggressively commoditize technology for the benefit of everyone.
- more like giving them a speed limited citi bike and expecting them to train for cyclocross.
- It's a lot stronger for geospatial intelligence tasks than any other model in my experience. Shame it's so slow in terms of tps
- That’s why you have to make your own geocoder