Comment by simonw - Hacker Neue

The startup credits are fully compatible with AI Studio, they are not specific to Vertex.

laborcontract May 4, 2025

Google Cloud Console's billing console for Vertex is so poor. I'm trying to figure out how much i spent on which models and I still cannot for the life of me figure it out. I'm assuming the only way to do it is to use the gemini billing assistant chatbot, but that requires me to turn on another api permission.

I still don't understand the distinction between Gemini and Vertex AI apis. It's like Logan K heard the criticisms about the API and helped push to split Gemini from the broader Google API ecosystem but it's only created more confusion, for me at least.

I couldn’t have said it better. My billing friends are working to address some of these concerns along with the Vertex team. We are planning to address this issue. Please stay tuned, we will come back to this thread to announce when we can In fact, if you can DM me (@chrischo_pm on X) with, I would love to learn more if you are interested.

jeswin May 4, 2025

Can you allow prepaid credits as well please?

byefruit May 4, 2025

100% this. We actually use OpenRouter (and pay their surcharge) with Gemini 2.5 Pro just because we can actually control spend via spent limit on keys (A++ feature) and prepaid credit.

one step ahead of you ;)

tyre May 4, 2025

Gemini’s is no better. Their data can be up to 24h stale and you can’t set hard caps on API keys. The best you can do is email notification billing alerts, which they acknowledge can be hours late.

__jl__ May 4, 2025

Only problem is that the genai API at https://ai.google.dev is far less reliable and can be problematic for production use cases. Right around the time Gemini 2.0 launched, it was done for days on end without any communication. They are putting a lot of effort into improving it but it's much less reliable than openai, which matters for production. They can also reject your request based on overall system load (not your individual limits), which is very unpredictable. They advertise 2000 requests per minute. When I tried several weeks ago, I couldn't even get 500 per minute.

Pls ping me if you run into any production issues, will raise right away to the team. We have massive at scale products operating on AI Studio, so we are set up to ensure stability.

OpenAI compatible API is missing important parameters, for example I don't think there is a way to disable flash 2 thinking with it.

Vertex AI is for grpc, service auth, and region control (amongst other things). Ensuring data remains in a specific region, allowing you to auth with the instance service account, and slightly better latency and ttft

simonw OP May 4, 2025

I find Google's service auth SO hard to figure out. I've been meaning to solve deploying to Cloud Run via service with for several years now but it just doesn't fit in my brain well enough for me to make the switch.

simonw, 'Google's service auth SO hard to figure out' – absolutely hear you. We're taking this feedback on auth complexity seriously. We have a new Vertex express mode in Preview (https://cloud.google.com/vertex-ai/generative-ai/docs/start/... , not ready for primetime yet!) that you can sign up for a free tier and get API Key right away. We are improving the experience, again if you would like to give feedback, please DM me on @chrischo_pm on X.

If you're on cloud run it should just work automatically.

For deploying, on GitHub I just use a special service account for CI/CD and put the json payload in an environment secret like an API key. The only extra thing is that you need to copy it to the filesystem for some things to work, usually a file named google_application_credentials.json

If you use cloud build you shouldn't need to do anything

candiddevmike May 4, 2025

You should consider setting up Workload Identity Federation and authentication to Google Cloud using your GitHub runner OIDC token. Google Cloud will "trust" the token and allow you to impersonate service accounts. No static keys!

Does not work for many Google services, including firebase

3 More Comments →

PantaloonFlames May 4, 2025

You could post on Reddit asking for help and someone is likely to provide answers, an explanation, probably even some code or bash commands to illustrate.

And even if you don't ask, there are many examples. But I feel ya. The right example to fit your need is hard to find.

mountainriver May 4, 2025

GCP auth is terrible in general. This is something aws did well

PantaloonFlames May 4, 2025

I don't get that. How?

- There are principals. (users, service accounts)

- Each one needs to authenticate, in some way. There are options here. SAML or OIDC or Google Signin for users; other options for service accounts.

- Permissions guard the things you can do in Google cloud.

- There are builtin roles that wrap up sets of permissions.

- you can create your own custom roles.

- attach roles to principals to give them parcels of permissions.

yeah bro just one more principal bro authenticate each one with SAML or OIDC or Google Signin bro set the permissions for each one make sure your service account has permissions aiplatform.models.get and aiplatform.models.list bro or make a custom role and attach the role to the principle to parcel the permission

It's not complicated in the context of huge enterprise applications, but for most people trying to use Google's LLMs, it's much more confusing than using an API key. The parent commenter is probably using an aws secret key.

And FWIW this is basically what google encourages you to do with firebase (with the admin service account credential as a secret key).

arccy May 4, 2025

GCP auth is actually one of the things it does way better than AWS. it's just that the entire industry has been trained on AWS's bad practices...

minimaxir May 4, 2025

From the linked docs:

> If you want to disable thinking, you can set the reasoning effort to "none".

For other APIs, you can set the thinking tokens to 0 and that also works.

Wow thanks I did not know

We added it to the docs. The downside of the OAI compat endpoint is we have to design the API twice, once for our API, then once through the OAI compat layer which makes it slower sometimes to have certain features, especially if we diverge at all.

mgraczyk May 11, 2025

Thanks, yes makes sense.

BTW, I have noticed that when tested outside GCP, the OpenAI compat endpoint has significantly lower latency for most requests (vs using the genai library). VertexAI is better than both.

Any idea why or if that will change?

We built the OpenAI Compatible API (https://cloud.google.com/vertex-ai/generative-ai/docs/multim...) layer to help customers that are already using OAI library to test out Gemini easily with basic inference but not as a replacement library for the genai sdk (https://github.com/googleapis/python-genai). We recommend using th genai SDK for working with Gemini.

mike_hearn May 4, 2025

So, to be clear, Google only supports Python as a language for accessing your models? Nothing else?

We have Python/Go in GA.

Java/JS is in preview (not ready for production) and will be GA soon!

troupo May 4, 2025

What about providing an actual API people can call without needing to rely on Google SDKs?

you can do so with the AI SDK from Vercel, open router, etc or just sending raw http requests

0: https://datatracker.ietf.org/doc/html/draft-fge-json-schema-...

This is documented for AI Studio here: https://ai.google.dev/gemini-api/docs/openai#thinking

Aeolun May 4, 2025

When I used the openai compatible stuff my API’s just didn’t work at all. I switched back to direct HTTP calls, which seems to be the only thing that works…

franze May 4, 2025

yeah, 2 days to get Google OAuth flow integrated into an background app/script, 1 day coding for the actual app ...

jpc0 May 4, 2025

Is this vertexAI related or in general, I find googles oauth flow to be extremely well documented and easy to setup…

jacob019 May 4, 2025

I got claude to write me an auth layer using only python http.client and cryptography. One shot no problem, now I can get a token from the service key any time, just have to track expiration. Annoying that they don't follow industry standard though.

arccy May 4, 2025

should have used ai to write the integrations...

franze May 4, 2025

thats with AI

as there are so many variations out there the AI gets majorly confused, as a matter of fact, the google oauth part is the one thing that gemini 2.5 pro cant code

should be its own benchmark

enneff May 4, 2025

Maybe you should just read the docs and use the examples there. I have used all kinds of GCP services for many years and auth is not remotely complicated imo.

shresbm123 May 5, 2025

We support reasoning_effort = none. That will let you disable flash 2 thinking. We will document it better.

omneity May 4, 2025

JSONSchema support on Google's OpenAI-compatible API is very lackluster and limiting. My biggest gripe really.

shresbm123 May 5, 2025

yeah we are looking into it

omneity May 5, 2025

Thank you! Adding support for `additionalProperties`[0] (and perhaps `patternProperties` too) would be particularly great!

Happy to provide test cases as well if helpful.

simonw, good points. The Vertex vs. non-Vertex Gemini API (via AI Studio at aistudio.google.com) could use more clarity.

For folks just wanting to get started quickly with Gemini models without the broader platform capabilities of Google Cloud, AI Studio and its associated APIs are recommended as you noted.

However, if you anticipate your use case to grow and scale 10-1000x in production, Vertex would be a worthwhile investment.

troupo May 4, 2025

Why create two different APIs that are the same, but only subtly different, and have several different SDKs?

I think you are talking about generativeai vs. vertexai vs. genai sdk.

And you are watching us evolve overtime to do better.

Couple clarifications 1. Going forward we only recommend using genai SDK 2. Subtle API differences - this is a bit harder to articulate but we are working to improve this. Please dm at @chrischo_pm if you would like to discuss further :)

troupo May 4, 2025

So. Three different SDKs.

No idea what any of those SDK names mean. But sure enoough searching will bring up all three of them for different combination of search terms, and none of them will point to the "recommend only using <a random name that is indistinguishable form other names>"

Oh, And some of these SDKs (and docs) do have a way to use this functionality without the SDKs, but not others. Because there are only 4 languages in the world, and everyone should be happy using them.

mark_l_watson May 5, 2025

I think you can strongly influence which SDK your customers use by keeping the Python, Typescript, and Curl examples in the documentation up to date and uniformly use what you consider the ‘best’ SDK in the examples.

Overall, I think that Google has done a great job recently in productizing access to your models. For a few years I wrote my own utilities to get stuff done, now I do much less coding using Gemini (and less often ChatGPT) because the product offerings do mostly what I want.

One thing I would like to see Google offer is easier integrated search with LLM generation. The ‘grounding’ examples are OK, but for use in Python I buy a few Perplexity API credits and use that for now. That is the single thing I would most like to see you roll out.

EDIT: just looked at your latest doc pages, I like the express mode setup with a unified access to regular APIs vs. Vertex.

chrisheecho May 5, 2025

Thanks! - I like it too :)

unknown_user_84 May 4, 2025

Indeed. Though the billing dashboard feels like an over engineered April fool's joke compared to Anthropic or OpenAI. And it takes too long to update with usage. I understand they tacked it into GCP, but if they're making those devs work 60 hours a week can we get a nicer, and real time, dashboard out of it at least?

we will have a dashboard in AI Studio very soon! Then will work to drive down delay.

coredog64 May 4, 2025

Wait until you see how to check Bedrock usage in AWS.

(While you can certainly try to use CloudWatch, it’s not exact. Your other options are “Wait for the bill” or log all Bedrock invocations to CloudWatch/S3 and aggregate there)

jacob019 May 4, 2025

Except that the OpenAI compatible endpoint isn't actually compatible. Doesn't support string enum values for function calls and throws a confusing error. Vertex at least has better error messages. My solution, just use text completions and emulate the tool call support client side, validate the responses against the schema, and retry on failure. It rarely has to retry and always works the 2nd time even without feedback.

ashu1461 May 4, 2025

There is also no way to over-write content moderation settings, and half of the responses you generate via open ai endpoint end up being moderated.

fzysingularity May 4, 2025

Vertex AI is essentially equivalent to Azure OpenAI - enterprise-ready, with HIPAA/SOC2 compliance and data-privacy guarantees.

FWIW OpenAI compatibility only gets you so far with Gemini. Gemini’s video/audio capabilities and context caching are unparalleled and you’ll likely need to use their SDKs instead to fully take advantage of them.

minimaxir May 4, 2025

Vertex AI is essentially a rebranding of their more enterprise platform on GCP, nothing explicitly "new."

ashu1461 May 4, 2025

Have to work hard to figure out the difference between

- Vertex AI

- AI Studio

- Gemini

- Firebase Gen AI

hustwindmaple1 May 4, 2025

If you are not a paying GCP user, there is really no point to even look at Vertex AI.

Just stick with AI Studio and the free developer AI along with it; you will be much much happier.

egamirorrim May 4, 2025

I use Vertex because that's the one that makes enterprise security people happy about how our datas handled.

Do Google use all the AI studio traffic to train etc?

sunaookami May 4, 2025

Not if you have billing enabled: https://ai.google.dev/gemini-api/docs/pricing