Preferences

The high level API seems very smooth to quickly iterate on testing RAGs. It seems great for prototyping, however I have doubts whether it's a good idea to hide the LLM calling logic in a DB extension.

Error handling when you get rate limited, the token has expired or the token length is too long would be problematic, and from a security point of view it requires your DB to directly call OpenAI which can also be risky.

Personally I haven't used that many Postgres extensions, so perhaps these risks are mitigated somehow that I don't know?


We are working on a 'self-hosted' alternative to OpenAI. The project already has that for the embeddings. i.e. you specify an open-source model from hugging face/sentence-transformers, then API calls get routed to that service that you're self hosting in a container next to Postgres. This is how the docker-compose example in the project readme is set up. We'll be doing the same pattern but for chat completion models.

On Tembo cloud, we deploy this as part of the VectorDB and RAG Stacks. So you get a dedicated Postgres instance, and a container next to Postgres that hosts the text-to-embeddings transformers. The API calls/data never leave your namespace.

I would agree with you. Similar to a Langchain in my mind, some interesting ideas but a lot more powerful to implement on your own. I would much rather use pgvectors directly.

This item has no comments currently.

Keyboard Shortcuts

Story Lists

j
Next story
k
Previous story
Shift+j
Last story
Shift+k
First story
o Enter
Go to story URL
c
Go to comments
u
Go to author

Navigation

Shift+t
Go to top stories
Shift+n
Go to new stories
Shift+b
Go to best stories
Shift+a
Go to Ask HN
Shift+s
Go to Show HN

Miscellaneous

?
Show this modal