Preferences

mips_avatar
Joined 1,985 karma
Me@jonready.com

Currently working on wanderfugl.com


  1. As someone who doesn't spend a huge amount of time thinking about art, this piece soothed me more than I could have expected. Thank You
  2. yeah but not super fast like flash or grok fast
  3. I’m sure they do something like that. I’ve noticed azure has way faster gpt 4.1 than OpenAI
  4. I'm pretty sure xAI exclusively uses Nvidia H100s for Grok inference but I could be wrong. I agree that I don't see why TPUs would necessarily explain latency.
  5. I just want a deepseek moment for an open weights model fast enough to use in my app, I hate paying the big guys.
  6. I don't think the models are doing this, time to first token is more of a hardware thing. But people writing agents are definitely doing this, particularly in voice it's worth it to use a smaller local llm to handle the acknowledgment before handing it off.
  7. I mean they’re trying to outdo google. So they need to do that.
  8. I cannot comprehend how they do not care about this segment of the market.
  9. I'll try benchmarking mistral against my eval, I've been impressed by kimi's importance but it's too slow to do anything useful realtime.
  10. It's weird they don't document this stuff. Like understanding things like tool call latency and time to first token is extremely important in application development.
  11. OpenAI made a huge mistake neglecting fast inferencing models. Their strategy was gpt 5 for everything, which hasn't worked out at all. I'm really not sure what model OpenAI wants me to use for my applications that require lower latency. If I follow their advice in their API docs about which models I should use for faster responses I get told either use GPT 5 low thinking, or replace gpt 5 with gpt 4.1, or switch to the mini model. Now as a developer I'm doing evals on all three of these combinations. I'm running my evals on gemini 3 flash right now, and it's outperforming gpt5 thinking without thinking. OpenAI should stop trying to come up with ads and make models that are useful.
  12. I mean I do too, had a really odd Gemini bug until I did byok on openrouter
  13. I'd be curious how many people use openrouter byok just to avoid figuring out the cloud consoles for gcp/azure.
  14. For my apps evals Gemini flash and grok 4 fast are the only ones worth using. I'd love for an open weights model to compete in this arena but I haven't found one.
  15. I'm more curious how Gemini 3 flash lite performs/is priced when it comes out. Because it may be that for most non coding tasks the distinction isn't between pro and flash but between flash and flash lite.
  16. It's important in a democracy for people to pay taxes, it includes them in the system in a way that doesn't exist in a place like Saudi Arabia. We shouldn't aim for a bread and circuses society, but for one where we aggressively commoditize technology for the benefit of everyone.
  17. more like giving them a speed limited citi bike and expecting them to train for cyclocross.
  18. It's a lot stronger for geospatial intelligence tasks than any other model in my experience. Shame it's so slow in terms of tps
  19. That’s why you have to make your own geocoder

This user hasn’t submitted anything.