What's the point of working at $ENTERPRISE_BIGCO if you don't fight with IT & Legal & various annoying middle managers.
Anyway let's table this for now and circle back later after we take care of some of the low hanging fruit. Keep me in the loop and I will do a deep dive into how we can think outside the box and turn this into a win-win. I will touch base with you when I have all my ducks in a row and we can hop on a call.
Google sounds like a fun place to work, run it up the flagpole and see if you can move the needle before the next hard stop for me.
For external service I have to get a unique card for billing and then upload monthly receipts, or ask our ops to get it setup and then wait for weeks as the sales/legal/compliance teams on each side talk to each other.
creds = service_account.Credentials.from_service_account_file(
SA_FILE,
scopes=[
"https://www.googleapis.com/auth/cloud-platform",
"https://www.googleapis.com/auth/generative-language",
]
)
google.genai.Client(
vertexai=True,
project=PROJECT_ID,
location=LOCATION,
http_options={"api_version": "v1beta1"},
credentials=sa_creds,
)
That `vertexai=True` does the trick - you can use same code without this option, and you will not be using "Vertex".Also, note, with Vertex, I am providing service account rather than API key, which should improve security and performance.
For me, the main aspect of "using Vertex", as in this example is the fact Start AI Cloud Credit ($350K) are only useable under Vertex. That is, one must use this platform to benefit from this generous credit.
Feels like the "Anthos" days for me, when Google now pushing their Enterprise Grade ML Ops platform, but all in all I am grateful for their generosity and the great Gemini model.
As a replacement for SA files one can have e.g. user accounts using SA impersonation, external identity providers, or run on GCP VM or GKE and use built-in identities.
(ref: https://cloud.google.com/iam/docs/migrate-from-service-accou...)
I still don't understand the distinction between Gemini and Vertex AI apis. It's like Logan K heard the criticisms about the API and helped push to split Gemini from the broader Google API ecosystem but it's only created more confusion, for me at least.
Vertex AI is for grpc, service auth, and region control (amongst other things). Ensuring data remains in a specific region, allowing you to auth with the instance service account, and slightly better latency and ttft
For deploying, on GitHub I just use a special service account for CI/CD and put the json payload in an environment secret like an API key. The only extra thing is that you need to copy it to the filesystem for some things to work, usually a file named google_application_credentials.json
If you use cloud build you shouldn't need to do anything
And even if you don't ask, there are many examples. But I feel ya. The right example to fit your need is hard to find.
- There are principals. (users, service accounts)
- Each one needs to authenticate, in some way. There are options here. SAML or OIDC or Google Signin for users; other options for service accounts.
- Permissions guard the things you can do in Google cloud.
- There are builtin roles that wrap up sets of permissions.
- you can create your own custom roles.
- attach roles to principals to give them parcels of permissions.
It's not complicated in the context of huge enterprise applications, but for most people trying to use Google's LLMs, it's much more confusing than using an API key. The parent commenter is probably using an aws secret key.
And FWIW this is basically what google encourages you to do with firebase (with the admin service account credential as a secret key).
> If you want to disable thinking, you can set the reasoning effort to "none".
For other APIs, you can set the thinking tokens to 0 and that also works.
BTW, I have noticed that when tested outside GCP, the OpenAI compat endpoint has significantly lower latency for most requests (vs using the genai library). VertexAI is better than both.
Any idea why or if that will change?
Java/JS is in preview (not ready for production) and will be GA soon!
as there are so many variations out there the AI gets majorly confused, as a matter of fact, the google oauth part is the one thing that gemini 2.5 pro cant code
should be its own benchmark
Happy to provide test cases as well if helpful.
0: https://datatracker.ietf.org/doc/html/draft-fge-json-schema-...
For folks just wanting to get started quickly with Gemini models without the broader platform capabilities of Google Cloud, AI Studio and its associated APIs are recommended as you noted.
However, if you anticipate your use case to grow and scale 10-1000x in production, Vertex would be a worthwhile investment.
And you are watching us evolve overtime to do better.
Couple clarifications 1. Going forward we only recommend using genai SDK 2. Subtle API differences - this is a bit harder to articulate but we are working to improve this. Please dm at @chrischo_pm if you would like to discuss further :)
No idea what any of those SDK names mean. But sure enoough searching will bring up all three of them for different combination of search terms, and none of them will point to the "recommend only using <a random name that is indistinguishable form other names>"
Oh, And some of these SDKs (and docs) do have a way to use this functionality without the SDKs, but not others. Because there are only 4 languages in the world, and everyone should be happy using them.
Overall, I think that Google has done a great job recently in productizing access to your models. For a few years I wrote my own utilities to get stuff done, now I do much less coding using Gemini (and less often ChatGPT) because the product offerings do mostly what I want.
One thing I would like to see Google offer is easier integrated search with LLM generation. The ‘grounding’ examples are OK, but for use in Python I buy a few Perplexity API credits and use that for now. That is the single thing I would most like to see you roll out.
EDIT: just looked at your latest doc pages, I like the express mode setup with a unified access to regular APIs vs. Vertex.
(While you can certainly try to use CloudWatch, it’s not exact. Your other options are “Wait for the bill” or log all Bedrock invocations to CloudWatch/S3 and aggregate there)
FWIW OpenAI compatibility only gets you so far with Gemini. Gemini’s video/audio capabilities and context caching are unparalleled and you’ll likely need to use their SDKs instead to fully take advantage of them.
- Vertex AI
- AI Studio
- Gemini
- Firebase Gen AI
Just stick with AI Studio and the free developer AI along with it; you will be much much happier.
Do Google use all the AI studio traffic to train etc?
If you can ignore Vertex most of the complaints here are solved - the non-Vertex APIs have easy to use API keys, a great debugging tool (https://aistudio.google.com), a well documented HTTP API and good client libraries too.
I actually use their HTTP API directly (with the ijson streaming JSON parser for Python) and the code is reasonably straight-forward: https://github.com/simonw/llm-gemini/blob/61a97766ff0873936a...
You have to be very careful when searching (using Google, haha) that you don't accidentally end up in the Vertext documentation though.
Worth noting that Gemini does now have an OpenAI-compatible API endpoint which makes it very easy to switch apps that use an OpenAI client library over to backing against Gemini instead: https://ai.google.dev/gemini-api/docs/openai
Anthropic have the same feature now as well: https://docs.anthropic.com/en/api/openai-sdk