650
points
299
comments blog.google
Docs: https://ai.google.dev/gemini-api/docs/gemini-3
Developer Blog: https://blog.google/technology/developers/build-with-gemini-...
Model Card [pdf]: https://deepmind.google/models/model-cards/gemini-3-flash/
Gemini 3 Flash in Search AI mode: https://blog.google/products/search/google-ai-mode-update-ge...
Deepmind Page: https://deepmind.google/models/gemini/flash/
I have been playing with it for the past few weeks, it’s genuinely my new favorite; it’s so fast and it has such a vast world knowledge that it’s more performant than Claude Opus 4.5 or GPT 5.2 extra high, for a fraction (basically order of magnitude less!!) of the inference time and price
After reading your comment I ran my product benchmark against 2.5 flash, 2.5 pro and 3.0 flash.
The results are better AND the response times have stayed the same. What an insane gain - especially considering the price compared to 2.5 Pro. I'm about to get much better results for 1/3rd of the price. Not sure what magic Google did here, but would love to hear a more technical deep dive comparing what they do different in Pro and Flash models to achieve such a performance.
Also wondering, how did you get early access? I'm using the Gemini API quite a lot and have a quite nice internal benchmark suite for it, so would love to toy with the new ones as they come out.
I periodically ask them questions about topics that are subtle or tricky, and somewhat niche, that I know a lot about, and find that they frequently provide extremely bad answers. There have been improvements on some topics, but there's one benchmark question that I have that just about every model I've tried has completely gotten wrong.
Tried it on LMArena recently, got a comparison between Gemini 2.5 flash and a codenamed model that people believe was a preview of Gemini 3 flash. Gemini 2.5 flash got it completely wrong. Gemini 3 flash actually gave a reasonable answer; not quite up to the best human description, but it's the first model I've found that actually seems to mostly correctly answer the question.
So, it's just one data point, but at least for my one fairly niche benchmark problem, Gemini 3 Flash has successfully answered a question that none of the others I've tried have (I haven't actually tried Gemini 3 Pro, but I'd compared various Claude and ChatGPT models, and a few different open weights models).
So, guess I need to put together some more benchmark problems, to get a better sample than one, but it's at least now passing a "I can find the answer to this in the top 3 hits in a Google search for a niche topic" test better than any of the other models.
Still a lot of things I'm skeptical about in all the LLM hype, but at least they are making some progress in being able to accurately answer a wider range of questions.
The only non-TPU fast models I'm aware of are things running on Cerebras can be much faster because of their CPUs, and Grok has a super fast mode, but they have a cheat code of ignoring guardrails and making up their own world knowledge.
https://artificialanalysis.ai/evaluations/omniscience
This story also shows the market corruption of Google's monopolies, but a judge recently gave them his stamp of approval so we're stuck with it for the foreseeable future.
/s
I have not worked with Sonnet enough to give an opinion there.
...and all of that done without any GPUs as far as i know! [1]
[1] - https://www.uncoveralpha.com/p/the-chip-made-for-the-ai-infe...
(tldr: afaik Google trained Gemini 3 entirely on tensor processing units - TPUs)
claude is coding model from the start but GPT is in more and more becoming coding model
Pretty much every person in the first (and second) world is using AI now, and only small fraction of those people are writing software. This is also reflected in OAI's report from a few months ago that found programming to only be 4% of tokens.
This sounds like you live in a huge echo chamber. :-(
I hope open source AI models catch up to gemini 3 / gemini 3 flash. Or google open sources it but lets be honest that google isnt open sourcing gemini 3 flash and I guess the best bet mostly nowadays in open source is probably glm or deepseek terminus or maybe qwen/kimi too.
For me the bigger concern which I have mentioned on other AI related topics is that AI is eating all the production of computer hardware so we should be worrying about hardware prices getting out of hand and making it harder for general public to run open source models. Hence I am rooting for China to reach parity on node size and crash the PC hardware prices.
I've been playing around with other models recently (Kimi, GPT Codex, Qwen, others) to try to better appreciate the difference. I knew there was a big price difference, but watching myself feeding dollars into the machine rather than nickles has also founded in me quite the reverse appreciation too.
I only assume "if you're not getting charged, you are the product" has to be somewhat in play here. But when working on open source code, I don't mind.
Otherwise, if it's a short prompt or answer, SOTA (state of the art) model will be cheap anyway and id it's a long prompt/answer, it's way more likely to be wrong and a lot more time/human cost is spent on "checking/debugging" any issue or hallucination, so again SOTA is better.
I tried to be quite clear with showing my work here. I agree that 17x is much closer to a single order of magnitude than two. But 60x is, to me, a bulk enough of the way to 100x that yeah I don't feel bad saying it's nearly two orders (it's 1.78 orders of magnitude). To me, your complaint feels rigid & ungenerous.
My post is showing to me as -1, but I standby it right now. Arguing over the technicalities here (is 1.78 close enough to 2 orders to count) feels besides the point to me: DeepSeek is vastly more affordable than nearly everything else, putting even Gemini 3 Flash here to shame. And I don't think people are aware of that.
I guess for my own reference, since I didn't do it the first time: at $0.50/$3.00 / M-i/o, Gemini 3 Flash here is 1.8x & 7.1x (1e1.86) more expensive than DeepSeek.
They are pushing the prices higher with each release though: API pricing is up to $0.5/M for input and $3/M for output
For comparison:
Gemini 3.0 Flash: $0.50/M for input and $3.00/M for output
Gemini 2.5 Flash: $0.30/M for input and $2.50/M for output
Gemini 2.0 Flash: $0.15/M for input and $0.60/M for output
Gemini 1.5 Flash: $0.075/M for input and $0.30/M for output (after price drop)
Gemini 3.0 Pro: $2.00/M for input and $12/M for output
Gemini 2.5 Pro: $1.25/M for input and $10/M for output
Gemini 1.5 Pro: $1.25/M for input and $5/M for output
I think image input pricing went up even more.
Correction: It is a preview model...
Google has been discontinuing older models after several months of transition period so I would expect the same for the 2.5 models. But that process only starts when the release version of 3 models is out (pro and flash are in preview right now).
You really need to look at the cost per task. artificialanalysis.ai has a good composite score, measures the cost of running all the benchmarks, and has 2d a intelligence vs. cost graph.
Presumably a big motivation for them is to be first to get something good and cheap enough they can serve to every Android device, ahead of whatever the OpenAI/Jony Ive hardware project will be, and way ahead of Apple Intelligence. Speaking for myself, I would pay quite a lot for truly 'AI first' phone that actually worked.
Stuff like:
"Open Chrome, new tab, search for xyz, scroll down, third result, copy the second paragraph, open whatsapp, hit back button, open group chat with friends, paste what we copied and send, send a follow-up laughing tears emoji, go back to chrome and close out that tab"
All while being able to just quickly glance at my phone. There is already a tool like this, but I want the parsing/understanding of an LLM and super fast response times.
On a related note, why would you want to break down your tasks to that level surely it should be smart enough to do some of that without you asking and you can just state your end goal.
Gemini 3 pro got 20%, and everyone else has gotten 0%. I saw benchmarks showing 3 flash is almost trading blows with 3 pro, so I decided to try it.
Basically it is an image showing a dog with 5 legs, an extra one photoshopped onto it's torso. Every models counts 4, and gemini 3 pro, while also counting 4, said the dog had a "large male anatomy". However it failed a follow-up saying 4 again.
3 flash counted 5 legs on the same image, however I added distinct a "tattoo" to each leg as an assist. These tattoos didn't help 3 pro or other models.
So it is the first out of all the models I have tested to count 5 legs on the "tattooed legs" image. It still counted only 4 legs on the image without the tattoos. I'll give it 1/2 credit.
Is there an OSS model that's better than 2.0 flash with similar pricing, speed and a 1m context window?
Edit: this is not the typical flash model, it's actually an insane value if the benchmarks match real world usage.
> Gemini 3 Flash achieves a score of 78%, outperforming not only the 2.5 series, but also Gemini 3 Pro. It strikes an ideal balance for agentic coding, production-ready systems and responsive interactive applications.
The replacement for old flash models will be probably the 3.0 flash lite then.
So if 2.5 Pro was good for your usecase, you just got a better model for about 1/3rd of the price, but might hurt the wallet a bit more if you use 2.5 Flash currently and want an upgrade - which is fair tbh.
It's extremely fast on good hardware, quite smart, and can support up to 1m context with reasonable accuracy
https://epoch.ai/benchmarks/simplebench
With this release the "good enough" and "cheap enough" intersect so hard that I wonder if this is an existential threat to those other companies.
In my experience, to get the best performance out of different models, they need slightly different prompting.
Maybe someday future models will all behave similarly given the same prompt, but we're not quite there yet
https://www.hackerneue.com/item?id=46290797
Opus and Sonnet are slower than Haiku. For lots of less sophisticated tasks, you benefit from the speed.
All vendors do this. You need smaller models that you can rapid-fire for lots of other reasons than vibe coding.
Personally, I actually use more smaller models than the sophisticated ones. Lots of small automations.
You say good enough. Great, but what if I as a malicious person were to just make a bunch of internet pages containing things that are blatantly wrong, to trick LLMs?
I assume that these are just different reasoning levels for Gemini 3, but I can't even find mention of there being 2 versions anywhere, and the API doesn't even mention the Thinking-Pro dichotomy.
Fast = Gemini 3 Flash without thinking (or very low thinking budget)
Thinking = Gemini 3 flash with high thinking budget
Pro = Gemini 3 Pro with thinking
>Fast = 3 Flash
>Thinking = 3 Flash (with thinking)
>Pro = 3 Pro (with thinking)
When I ask Gemini 3 Flash this question, the answer is vague but agency comes up a lot. Gemini thinking is always triggered by a query.
This seems like a higher-level programming issue to me. Turn it into a loop. Keep the context. Those two things make it costly for sure. But does it make it an AGI? Surely Google has tried this?
Which obviously opens up a can of worms regarding who should have authority to supply the "right answer," but still... lacking the core capability, AGI isn't something we can talk about yet.
LLMs will be a part of AGI, I'm sure, but they are insufficient to get us there on their own. A big step forward but probably far from the last.
- An AGI wouldn't hallucinate, it would be consistent, reliable and aware of its own limitations
- An AGI wouldn't need extensive re-training, human reinforced training, model updates. It would be capable of true self-learning / self-training in real time.
- An AGI would demonstrate real genuine understanding and mental modeling, not pattern matching over correlations
- It would demonstrate agency and motivation, not be purely reactive to prompting
- It would have persistent integrated memory. LLM's are stateless and driven by the current context.
- It should even demonstrate consciousness.
And more. I agree that what've we've designed is truly impressive and simulates intelligence at a really high level. But true AGI is far more advanced.
I don't believe the "consciousness" qualification is at all appropriate, as I would argue that it is a projection of the human machine's experience onto an entirely different machine with a substantially different existential topology -- relationship to time and sensorium. I don't think artificial general intelligence is a binary label which is applied if a machine rigidly simulates human agency, memory, and sensing.
I'm speculating but Google might have figured out some training magic trick to balance out the information storage in model capacity. That or this flash model has huge number of parameters or something.
https://artificialanalysis.ai/evaluations/omniscience
Prepare to be amazed
Can someone explain how Gemini 3 pro/flash then do so well then in the overall Omniscience: Knowledge and Hallucination Benchmark?
More experts with a lower pertentage of active ones -> more sparsity.
Their retention controls for both consumer and business suck. It’s the worst of any of the leaders.
It's 1/4 the price of Gemini 3 Pro ≤200k and 1/8 the price of Gemini 3 Pro >200k - notable that the new Flash model doesn’t have a price increase after that 200,000 token point.
It’s also twice the price of GPT-5 Mini for input, half the price of Claude 4.5 Haiku.
For comparison, from 2.5 Pro ($1.25 / $10) to 3 Pro ($2 / $12), there was 60% increase in input tokens and 20% increase in output tokens pricing.
> Gemini 3 Flash is able to modulate how much it thinks. It may think longer for more complex use cases, but it also uses 30% fewer tokens on average than 2.5 Pro.
thinkingConfig: { thinkingLevel: "low", }
More about it here https://ai.google.dev/gemini-api/docs/gemini-3#new_api_featu...
On that note it would be nice to get these benchmark numbers based on the different reasoning settings.
Developer Blog: https://blog.google/technology/developers/build-with-gemini-...
Model Card [pdf]: https://deepmind.google/models/model-cards/gemini-3-flash/
Gemini 3 Flash in Search AI mode: https://blog.google/products/search/google-ai-mode-update-ge...
For example, the Gemini 3 Pro collection: https://blog.google/products/gemini/gemini-3-collection/
But having everything linked at the bottom of the announcement post itself would be really great too!
https://artificialanalysis.ai/evaluations/omniscience
https://youtu.be/4p73Uu_jZ10?si=x1gZopegCacznUDA&t=582
Its great that they have these new fast models, but the release hype has made Gemini Pro pretty much unusable for hours.
"Sorry, something went wrong"
random sign-outs
random garbage replies, etc
Now, imagine for a moment they had also vertically integrated the hardware to do this.
The most terrifying thing would be Google expanding its free tiers.
Then you realise you aren't imagining it.
Google is great on the data science alone, every thing else is an after thought
"And then imagine Google designing silicon that doesn’t trail the industry."
I'm def not a Google stan generally, but uh, have you even been paying attention?
https://en.wikipedia.org/wiki/Tensor_Processing_Unit
TPUs on the other hand are ASICs, we are more than familiar with the limited application, high performance and high barriers to entry associated with them. TPUs will be worthless as the AI bubble keeps deflating and excess capacity is everywhere.
The people who don't have a rudimentary understanding are the wall street boosters that treat it like the primary threat to Nvidia or a moat for Google (hint: it is neither).
Flash is meant to be a model for lower cost, latency-sensitive tasks. Long thinking times will both make TTFT >> 10s (often unacceptable) and also won't really be that cheap?
Just avoiding/fixing that would probably speed up a good chunk of my own queries.
Summarize recent working arxiv url
And then it tells me the date is from the future and it simply refuses to fetch the URL.
Image model they have released is much worse than nano banana pro, ghibli moment did not happen
Their GPT 5.2 is obviously overfit on benchmarks as a consensus of many developers and friends I know. So Opus 4.5 is staying on top when it comes to coding
The weight of the ads money from google and general direction + founder sense of Brin brought the google massive giant back to life. None of my companies workflow run on OAI GPT right now. Even though we love their agent SDK, after claude agent SDK it feels like peanuts.
This has been true for at least 4 months and yeah, based on how these things scale and also Google's capital + in-house hardware advantages, it's probably insurmountable.
Edit: And just to add an example: openAI's Codex CLI billing is easy for me. I just sign up for the base package, and then add extra credits which I automatically use once I'm through my weekly allowance. With Gemini CLI I'm using my oauth account, and then having to rotate API keys once I've used that up.
Also, Gemini CLI loves spewing out its own chain of thought when it gets into a weird state.
Also Gemini CLI has an insane bias to action that is almost insurmountable. DO NOT START THE NEXT STAGE still has it starting the next stage.
Also Gemini CLI has been terrible at visibility on what it's actually doing at each step - although that seems a bit improved with this new model today.
It's when it becomes difficult, like in the coding case that you mentioned, that we can see the OpenAI still has the lead. The same is true for the image model, prompt adherence is significantly better than Nano Banana. Especially at more complex queries.
My logic test and trying to get an agent to develop a certain type of ** implementation (that is published and thus the model is trained on to some limited extent) really stress test models, 5.2 is a complete failure of overfitting.
Really really bad in an unrecoverable infinite loop way.
It helps when you have existing working code that you know a model can't be trained on.
It doesn't actually evaluate the working code it just assumes it's wrong and starts trying to re-write it as a different type of **.
Even linking it to the explanation and the git repo of the reference implementation it still persists in trying to force a different **.
This is the worst model since pre o3. Just terrible.
But for anyone using LLM's to help speed up academic literature reviews where every detail matters, or coding where every detail matters, or anything technical where every detail matters -- the differences very much matter. And benchmarks serve just to confirm your personal experience anyways, as the differences between models becomes extremely apparent when you're working in a niche sub-subfield and one model is showing glaring informational or logical errors and another mostly gets it right.
And then there's a strong possibility that as experts start to say "I always trust <LLM name> more", that halo effect spreads to ordinary consumers who can't tell the difference themselves but want to make sure they use "the best" -- at least for their homework. (For their AI boyfriends and girlfriends, other metrics are probably at play...)
In fact so far, they consistently fail in exactly these scenario, glossing over random important details whenever you double check results in depth.
You might have found models, prompts or workflows that work for you though, I'm interested.
We've seen this movie before. Snapchat was the darling. Infact, it invented the entire category and was dominating the format for years. Then it ran out of time.
Now very few people use Snapchat, and it has been reduced to a footnote in history.
If you think I'm exaggerating, that just proves my point.
Just go outside the bubble plus take a bit older people
Founders are special, because they are not beholden to this social support network to stay in power and founders have a mythos that socially supports their actions beyond their pure power position. The only others they are beholden too are their co-founders, and in some cases major investor groups. This gives them the ability to disregard this social balance because they are not dependent on it to stay on power. Their power source is external to the organization, while everyone else is internal to it.
This gives them a very special "do something" ability that nobody else has. It can lead to failures (zuck & occulus, snapchat spectacles) or successes (steve jobs, gemini AI), but either way, it allows them to actually "do something".
Of course they are. Founders get fired all the time. As often as non-founder CEOs purge competition from their peers.
> The only others they are beholden too are their co-founders, and in some cases major investor groups
This describes very few successful executives. You can have your co-founders and investors on board, if your talent and customers hate you, they’ll fuck off.
The merger happened in April 2023.
Gemini 1.0 was released in Dec 2023, and the progress since then has been rapid and impressive.
Ghibli moment was only about half a year ago. At that moment, OpenAI was so far ahead in terms of image editing. Now it's behind for a few months and "it can't be reversed"?
Kara Swisher recently compared OpenAI to Netscape.
the reason this matters is slowing velocity raises the risk of featurization, which undermines LLMs as a category in consumer. cost efficiency of the flash models reinforces this as google can embed LLM functionality into search (noting search-like is probably 50% of chatgpt usage per their july user study). i think model capability was saturated for the average consumer use case months ago, if not longer, so distribution is really what matters, and search dwarfs LLMs in this respect.
https://techcrunch.com/2025/12/05/chatgpts-user-growth-has-s...
https://lmarena.ai/leaderboard/text-to-image
https://lmarena.ai/leaderboard/image-edit
so they get lapped a few times and then drop a fantastic new model out of nowhere
the same is going to happen to Google again, Anthropic again, OpenAI again, Meta again, etc
they're all shuffling the same talent around, its California, that's how it goes, the companies have the same institutional knowledge - at least regarding their consumer facing options
Out of all the big4 labs, google is the last I'd suspect of benchmaxxing. Their models have generally underbenched and overdelivered in real world tasks, for me, ever since 2.5 pro came out.
Turns out Gemini 3 Flash is pretty close. The Gemini CLI is not as good but the model more than makes up for it.
The weird part is Gemini 3 Pro is nowhere as good an experience. Maybe because its just so slow.
Might be using flash for my MCP research/transcriber/minor tasks modl over haiku, now, though (will test of course)
Well worth every penny now
Pipe dream right now, but 50 years later? Maybe
https://deepmind.google/models/gemini-robotics/
Previous discussions: https://www.hackerneue.com/item?id=43344082
Google keeps their models very "fresh" and I tend to get more correct answers when asking about Azure or O365 issues, ironically copilot will talk about now deleted or deprecated features more often.
The model is very hard to work with as is.
No matter the model, AI Overview/Results in Google are just hallucinated nonsense, only providing roughly equivalent information to what is in the linked sources as a coincidence, rather than due to actually relying on them.
Whether DuckDuckGo, Kagi, Ecosia or anything else, they are all objectively and verifiably better search engines than Google as of today.
This isn't new either, nor has it gotten better. AI Overview has been and continues to be a mess that makes it very clear to me anyone claiming Google is still the "best" search engine results wise is lying to themselves. Anyone saying Google search in 2025 is good or even usable is objectively and verifiably wrong and claiming DDG or Kagi offer less usable results is equally unfounded.
Either fix your models finally so they adhere to and properly quote sources like your competitors somehow manage or, preferably, stop forcing this into search.
Firstly, 3 Flash is wicked fast and seems to be very smart for a low latency model, and it's a rush just watching it work. Much like the YOLO mode that exists in Gemini CLI, Flash 3 seems to YOLO into solutions without fully understanding all the angles e.g. why something was intentionally designed in a way that at first glance may look wrong, but ended up this way through hard won experience. Codex gpt 5.2 xhigh on the other hand does consider more angles.
It's a hard come-down off the high of using it for the first time because I really really really want these models to go that fast, and to have that much context window. But it ain't there. And turns out for my purposes the longer chain of thought that codex gpt 5.2 xhigh seems to engage in is a more effective approach in terms of outcomes.
And I hate that reality because having to break a lift into 9 stages instead of just doing it in a single wicked fast run is just not as much fun!
Just do it.
I use a service where I have access to all SOTA models and many open sourced models, so I change models within chats, using MCPs eg start a chat with opus making a search with perplexity and grok deepsearch MCPs and google search, next query is with gpt 5 thinking Xhigh, next one with gemini 3 pro, all in the same conversation. It's fantastic! I can't imagine what it would be like again to be locked into using one (or two) companies. I have nothing to do with the guys who run it (the hosts from the podcast This day in AI, though if you're interested have a look in the simtheory.ai discord.
I don't know how people use one service can manage...
Skatval is a small local area I live in, so I know when it's bullshitting. Usually, I get a long-winded answer that is PURE Barnum-statement, like "Skatval is a rural area known for its beautiful fields and mountains" and bla bla bla.
Even with minimal thinking (it seems to do none), it gives an extremely good answer. I am really happy about this.
I also noticed it had VERY good scores on tool-use, terminal, and agentic stuff. If that is TRUE, it might be awesome for coding.
I'm tentatively optimistic about this.
Hoping that the local ones keep progressively up (gemma-line)
-> 2.5 Flash Lite is super fast & cheap (~1-1.5s inference), but poor quality responses.
-> 2.5 Flash gives high quality responses, but fairly expensive & slow (5-7s inference)
I really just need an in-between for Flash and Flash Lite for cost and performance. Right now, users have to wait up to 7s for a quality response.
1, has anyone actually found 3 Pro better than 2.5 (on non code tasks)? I struggle to find a difference beyond the quicker reasoning time and fewer tokens.
2, has anyone found any non-thinking models better than 2.5 or 3 Pro? So far I find the thinking ones significantly ahead of non thinking models (of any company for that matter.)
I do feel like it's not an entirely accurate caricature (recency bias? limited context?), but it's close enough.
Good work!
You should do a "show HN" if you're not worried about it costing you too much.
I think part of what enables a monopoly is absence of meaningful competition, regardless of how that's achieved -- significant moat, by law or regulation, etc.
I don't know to what extent Google has been rent-seeking and not innovating, but Google doesn't have the luxury to rent-seek any longer.
its almost as good as 5.2 and 4.5 but way faster and cheaper
I don't view this as a "new Flash" but as "a much cheaper Gemini 3 Pro/GPT-5.2"
Also, I hate that I cannot send the Google models in a "Thinking" mode like in ChatGPT. When I send GPT 5.1 Thinking on a legal task and tell it to check and cite all sources, it takes +10 minutes to answer, but it did check everything and cite all its sources in the text; whereas the Gemini models, even 3 Pro, always answer after a few seconds and never cite their sources, making it impossible to click to check the answer. It makes the whole model unusable for these tasks. (I have the $20 subscription for both)
Definitely has not been my experience using 3 Pro in Gemini Enterprise - in fact just yesterday it took so long to do a similar task I’d thought something was broken. Nope, just re-chrcking a source
Just tried once again with the exact same prompt: GPT-5.1-Thinking took 12m46s and Gemini 3.0 Pro took about 20 seconds. The latter obviously has a dramatically worse answer as a result.
(Also, the thinking trace is not in the correct language, and doesn't seem to show which sources have been read at which steps- there is only a "Sources" tab at the end of the answer.)
I'm more excited to see 3 Flash Lite. Gemini 2.5 Flash Lite needs a lot more steering than regular 2.5 Flash, but it is a very capable model and combined with the 50% batch mode discount it is CHEAP ($0.05/$0.20).
I just always thought the taste of gpt or claude models was more interesting in the professional context and their end user chat experience more polished.
are there obvious enterprise use cases where gemini models shine?
Also I don't see it written in the blog post but Flash supports more granular settings for reasoning: minimal, low, medium, high (like openai models), while pro is only low and high.
> Matches the “no thinking” setting for most queries. The model may think very minimally for complex coding tasks. Minimizes latency for chat or high throughput applications.
I'd prefer a hard "no thinking" rule than what this is.
Wasn't this the case with the 2.5 Flash models too? I remember being very confused at that time.
To me it seems like the big model has been "look what we can do", and the smaller model is "actually use this one though".
ChatGPT still has 81% market share as of this very moment, vs Gemini's ~2%, and arguably still provides the best UX and branding.
Everyone and their grandma knows "ChatGPT", who outside developers' bubble has even heard of Gemini Flash?
Yea I don't think that dynamic is switching any time soon.
where did you get this from?
You're not doing anything wrong. Everyone knows what you're doing. You have no secrets to hide.
Yet you value your privacy anyway. Why?
Also - I have no problem using Anthropic's cloud-hosted services. Being opposed to some cloud providers doesn't mean I'm opposed to all cloud providers.
Anthropic - one of GCP’s largest TPU customers? Good for you.
https://www.anthropic.com/news/expanding-our-use-of-google-c...