film42 [at] google mail
- film42Is open router still sending all OCR jobs to Mistral? I wonder if they're trying to keep that spot. Seems like Mistral and Google are the best at OCR right now, with Google leading Mistral by a fair bit.
- Installed _on_ an engine that operates at 200ºC!
- This is why I migrated my apps that need an LLM to Gemini. No model degradation so far all through the v2.5 model generation. What is Anthropic doing? Swapping for a quantized version of the model?
- At this point I'm only using google models via Vertex AI for my apps. They have a weird QoS rate limit but in general Gemini has been consistently top tier for everything I've thrown at it.
Anecdotal, but I've also not experienced any regression in Gemini quality where Claude/OpenAI might push iterative updates (or quantized variants for performance) that cause my test bench to fail more often.
- It's a cash grab. More conversational AI means more folks running out of free or lower paid tier tokens faster, leading to more upsell opportunities. API users will pay more in output tokens by default.
Example, I asked Claude a high level question about p2p systems and it started writing code in 3 languages. Ignoring the code, asking a follow up about the fundamentals, it answered and then rewrote the code 3 times. After a few minutes I hit a token limit for the first time.
- AKA... 70% of existing google cloud users have filled out a support ticket, which starts with a chat bot.
- Maybe look at R2 or Wasabi instead of S3. That would cut your storage bill by 3x and take your cloud network bill to zero. IMO self-managing DBs always sucks no matter what you do.
- Or you have PTSD after 10 years of being on-call 24/7 for your company's stack. I've built my next chapter around offloading the pager. Worth every penny.
- They created their business on open source. Free software was their top of funnel. Free customers become paid customers, and fund the business. They are more than welcome to change this, but there is no way they don't end up with egg on their face, and that's what we're seeing here.
- I have a feeling this is IBM dipping their toes into the water of Postgres after seeing Databricks acquire Neon and Snowflake acquire CrunchyData. Won't be surprised if IBM acquires them next year. It makes a lot of sense and I wish everyone the best of luck.
From the outset, CockroachDB has a "just scale it up" product, an open-source presence, a serverless product with pricing similar to DSQL from AWS, and enterprise support. There is a lot to work with there if you're IBM.
- I'm a fan because it's something you can explicitly turn on and off. For my Docker based app, I really want to verify the completeness of imports. Preferably, at build and test time. In fact, most of the time I will likely disable lazy loading outright. But, I would really appreciate a faster loading CLI tool.
However, there is a pattern in python to raise an error if, say, pandas doesn't have an excel library installed, which is fine. In the future, will maintainers opt to include a bunch of unused libraries since they won't negatively impact startup time? (Think pandas including 3-4 excel parsers by default, since it will only be loaded when called). It's a much better UX, but, now if you opt out of lazy loading, your code will take longer to load than without it.
- My guess is both will look about the same with real world workloads. Worker is certainly more predictable which is safer in general. That said, I appreciate the callout about signal throughput on workers (fewer connections farm to more processes vs each connection getting its own io_uring setup with upper bound being the throughput for a single process). Again, I doubt it makes any difference for 99.9999% of apps out there.
- Very cool post. If Jeff Geerling is reading this, I wouldn't mind watching a video on each of these ;)
- 1 point
- I think I update Vundle like once every 3 years.
- The 1M token context was Gemini's headlining feature. Now, the only thing I'd like Claude to work on is tokens counted towards document processing. Gemini will often bill 1/10th the tokens Anthropic does for the same document.
- Agree but pricing wise, Gemini 2.5 pro wins. Gemini input tokens are half the cost of Claude 4. Output is $5/million cheaper than Claude. But, document processing is significantly cheaper. A 5MB PDF (customer invoice) with Gemini is like 5k tokens vs 56k with Claude.
The only downside with Gemini (and it's a big one) is availability. We get rate limited by their dynamic QoS all the time even if we haven't reached our quota. Our GCP sales rep keeps recommending "provisioned throughput," but it's both expensive, and doesn't fit our workload type. Plus, the VertexAI SDK is kind of a PITA compared to Anthropic.
- Is there a crowd-sourced sentiment score for models? I know all these scores are juiced like crazy. I stopped taking them at face value months ago. What I want to know is if other folks out there actually use them or if they are unreliable.
- Thanks for your comment! I have a few PDFs that I need to generate for groups of users every so often and since wkhtmltopdf is considered EOL, I've been forced to use chrome (which sucks to manage). I just rewrote that code to use Typst (via the typst gem) and it's so so so much better.