We are working on a 'self-hosted' alternative to OpenAI. The project already has that for the embeddings. i.e. you specify an open-source model from hugging face/sentence-transformers, then API calls get routed to that service that you're self hosting in a container next to Postgres. This is how the docker-compose example in the project readme is set up. We'll be doing the same pattern but for chat completion models.
On Tembo cloud, we deploy this as part of the VectorDB and RAG Stacks. So you get a dedicated Postgres instance, and a container next to Postgres that hosts the text-to-embeddings transformers. The API calls/data never leave your namespace.
I would agree with you. Similar to a Langchain in my mind, some interesting ideas but a lot more powerful to implement on your own. I would much rather use pgvectors directly.
Error handling when you get rate limited, the token has expired or the token length is too long would be problematic, and from a security point of view it requires your DB to directly call OpenAI which can also be risky.
Personally I haven't used that many Postgres extensions, so perhaps these risks are mitigated somehow that I don't know?