Profile: mips_avatar - Hacker Neue

mips_avatar

Joined Jun 3, 2018 1,985 karma

Me@jonready.com

Currently working on wanderfugl.com

mips_avatar 1 day ago

As someone who doesn't spend a huge amount of time thinking about art, this piece soothed me more than I could have expected. Thank You
mips_avatar 1 day ago

yeah but not super fast like flash or grok fast
mips_avatar 1 day ago

I’m sure they do something like that. I’ve noticed azure has way faster gpt 4.1 than OpenAI
mips_avatar 1 day ago

I'm pretty sure xAI exclusively uses Nvidia H100s for Grok inference but I could be wrong. I agree that I don't see why TPUs would necessarily explain latency.
mips_avatar 1 day ago

I just want a deepseek moment for an open weights model fast enough to use in my app, I hate paying the big guys.
mips_avatar 1 day ago

I don't think the models are doing this, time to first token is more of a hardware thing. But people writing agents are definitely doing this, particularly in voice it's worth it to use a smaller local llm to handle the acknowledgment before handing it off.
mips_avatar 1 day ago

I mean they’re trying to outdo google. So they need to do that.
mips_avatar 1 day ago

I cannot comprehend how they do not care about this segment of the market.
mips_avatar 1 day ago

I'll try benchmarking mistral against my eval, I've been impressed by kimi's importance but it's too slow to do anything useful realtime.
mips_avatar 1 day ago

It's weird they don't document this stuff. Like understanding things like tool call latency and time to first token is extremely important in application development.
mips_avatar 2 days ago

OpenAI made a huge mistake neglecting fast inferencing models. Their strategy was gpt 5 for everything, which hasn't worked out at all. I'm really not sure what model OpenAI wants me to use for my applications that require lower latency. If I follow their advice in their API docs about which models I should use for faster responses I get told either use GPT 5 low thinking, or replace gpt 5 with gpt 4.1, or switch to the mini model. Now as a developer I'm doing evals on all three of these combinations. I'm running my evals on gemini 3 flash right now, and it's outperforming gpt5 thinking without thinking. OpenAI should stop trying to come up with ads and make models that are useful.
mips_avatar 2 days ago

I mean I do too, had a really odd Gemini bug until I did byok on openrouter
mips_avatar 2 days ago

I'd be curious how many people use openrouter byok just to avoid figuring out the cloud consoles for gcp/azure.
mips_avatar 2 days ago

For my apps evals Gemini flash and grok 4 fast are the only ones worth using. I'd love for an open weights model to compete in this arena but I haven't found one.
mips_avatar 2 days ago

I'm more curious how Gemini 3 flash lite performs/is priced when it comes out. Because it may be that for most non coding tasks the distinction isn't between pro and flash but between flash and flash lite.
mips_avatar 4 days ago

It's important in a democracy for people to pay taxes, it includes them in the system in a way that doesn't exist in a place like Saudi Arabia. We shouldn't aim for a bread and circuses society, but for one where we aggressively commoditize technology for the benefit of everyone.
mips_avatar 4 days ago

more like giving them a speed limited citi bike and expecting them to train for cyclocross.
mips_avatar Dec 14, 2025

It's a lot stronger for geospatial intelligence tasks than any other model in my experience. Shame it's so slow in terms of tps
mips_avatar Dec 6, 2025

That’s why you have to make your own geocoder

This user hasn’t submitted anything.