Find me at narekg at berkeley dot edu
Email here for priority response:
WITH hash AS ( SELECT string_to_array(md5('4'), NULL) as h ) SELECT 'narekg+' || h[15] || h[16] || h[13] || h[1] || '@berkeley.edu' AS priority_email FROM hash;
- They would need to handle all the translation changes as well, no?
<https://github.com/search?q=repo%3Apostgres%2Fpostgres+%22ma...>
I agree code change is simple, but I guess the task is complex for other reasons
- 3 points
- Surprised to not see Ubicloud in there, which provides cloud services on top of various (including European) infra providers.
- 345 points
- This is really cool, congrats on launch!
I am curious how you prevent private data from getting leaked to the auto-generated public docs. I imagine this problem does not exist in open source projects, but would become an issue if not everything discussed in company's private messenger should be used as context for generating docs.
- Author here.
> I don’t think “get moar ram” is a good response to that particular critique.
I do not think the blog post suggested "get more ram" as a response, but happy to clarify if you could share more details!
> Indexing in Postgres is legitimately painful
Lantern is here to make the process seamless and remove most of the pain for people building LLM/AI applications. Examples:
1. We build tools to remove the guesswork of HNSW index sizing. E.g. https://lantern.dev/blog/calculator
2. We analyze typical patterns people use when building LLM apps and suggest better practices. E.g. https://lantern.dev/blog/async-embedding-tables
3. We build alerts and triggers into our cloud database that automate the discovery of many issues via heuristics.
- Author here. We will benchmark this thoroughly in the future for our vector indexes.
But at least anecdotally, it made a ton of difference.
We met <200ms latency budget with Ubicloud NVMes but had to wait seconds to get an answer from the same query with GCP persistent disks or local SSDs
- I have tried CLIP on my personal photo album collection and it worked really well there - I could write detailed scene descriptions of past road trips, and the photos I had in mind would pop up. Probably the model is better for everyday photos than for icons
- We provide this functionality in Lantern cloud via our Lantern Extras extension: <https://github.com/lanterndata/lantern_extras>
You can generate CLIP embeddings locally on the DB server via:
Then you can search for embeddings via:SELECT abstract, introduction, figure1, clip_text(abstract) AS abstract_ai, clip_text(introduction) AS introduction_ai, clip_image(figure1) AS figure1_ai INTO papers_augmented FROM papers;
The approach significantly decreases search latency and results in cleaner code. As an added bonus, EXPLAIN ANALYZE can now tell percentage of time spent in embedding generation vs search.SELECT abstract, introduction FROM papers_augmented ORDER BY clip_text(query) <=> abstract_ai LIMIT 10;The linked library enables embedding generation for a dozen open source models and proprietary APIs (list here: <https://lantern.dev/docs/develop/generate>, and adding new ones is really easy.
- For similar isolation level anomalies in real world applications check out this SIGMOD '17 paper:
ACIDRain: Concurrency-Related Attacks on Database-Backed Web Applications: http://www.bailis.org/papers/acidrain-sigmod2017.pdf
- Not sure what the approach of this library is, but can't you generate a nonce from a larger alphabet, hash the column values with the nonce `hash(nonce || column)`, and crypto-shred the nonce in the end.
Then, during hashing you just need a constant immutable state, which effectively expands the hash space, without incurring the mutable state overhead of replacement strings strategy.
- this is cool!
Does this only collect logs from frontend?
Or it can also collect the backend and DB latency data related to a frontend interaction?
- My understanding is Trusted Language Extensions refer to extensions written in PL/Rust - a Postgres extension mechanism to write user defined functions and use them in SQL queries.
PL/Rust is a more performant and more feature-rich alternative to PL/pgSQL, which is the traditional UDF scripting language for Postgres.
Building a vector index (or any index for that matter) inside Postgres is a more involved process and can not be done via the UDF interface, be it Rust, C or PL/pgSQL.
So, I think even if Lantern was written in Rust, it would not be a Trusted Language Extension under this definition.
- cofounder here.
You are right that there are many trade-offs between HNSW and IVFFLAT.
E.g. IVFFLAT requires there be significant amount of data in the table, before the index is created, and assumes data distribution does not change with additional inserts (since it chooses centroids during the initial creation and never updates them)
We have also generally had harder time getting high recall with IVFFLAT on vectors from embedding models such as ada-002.
There are trade-offs, some of which we will explore in later blog posts.
This post is about one thing - HNSW index creation time across two systems, at a fixed 99% recall.
- It is strange that there is no mention of Ruben Vardanyan, even when FAST foundation is mentioned.
But still cool to read about Armenia's tech sector here.
- We have not run microbenchmarks to see what dimension ranges perform best but those are coming soon! Below is an anecdotal answer:
We run our ci/cd benchmarks on 128dim sift vectors. We have some demos using clip embeddings (512dim) and baai/bge 768 dimensional embeddings.
Generally, smaller vectors allow higher throughput and result in smaller indexes. But the effect on performance is small. Once we merge the PR implementing vector element casts to 1 and 2 byte floats, the effect of this on throughput should be even smaller.
- This sounds like a very useful feature, and we will prioritize this.
You’re correct that IVFFLAT would be faster for your use case. However, IVFFLAT’s shortcoming is bad recall, which means less relevant results for your application. We believe that our HNSW implementation (or other indexes) can handle use cases like yours.
We currently handle a similar use-case by rerunning our index searches with exponentially increasing LIMITs and dropping the results which are not needed. Could you send us an email at support@lantern.dev? We can generate the numbers by this weekend, and get back to you with concrete results.
By the way – not sure if you saw in our blog post, if you’re using pgvector in production and switch to Lantern, we’ll help you every single step of the way. It’s very quick, and we’ll also send you some free AirPods Pro at the end of it!
- >When you say "produced locally", do you mean on the client?
Sorry for the confusion. By “produced locally” I meant “produced on your DB server” as opposed to being an API call to a third party service such as OpenAI or HuggingFace.
(But, like... doing it remotely--on the database server as part of the query plan--frankly seems kind of crazy to me, as it is going to be so slow and add a massive CPU load to what should be an I/O workload. Makes for good demos I bet, but otherwise unusable in a database context.)
It seems like you’re worried about these workflows being on the Postgres server, which may lead to performance issues.
However, if performance becomes an issue, the functions can be executed on another server. In this approach, whether or not the functions run on the Postgres server, the end user gets access to a better developing experience as all the functions they need are available within SQL.
>...this frankly shouldn't be part of the same extension We agree. These functions are already in another repository, and not part of the same extension. The repository is here: https://github.com/lanterndata/lantern_extras
- Our index access method will be called lantern_hnsw if pgvector or any other provider has already taken the hnsw access method name.
btw, we did not create our own vector type and just use size-enforced real[] arrays to represent embeddings. However, you can use our index with pgvector's vector type. So, if you already have a table with pgvector's vector column type, you can start using Lantern by just creating an index on the same column.
- Pgvector builds a vector index.
Our extension, similarly, builds an index but also extends SQL in more ways.
For example,
- Generating embeddings to augment plain relational data
- Using data from local proprietary embedding models or third-party model APIs in queries.
We have more things planned like vector versioning, data retention policies and recall regression tracking.
> What does "without leaving your database" mean in this context?
You can work with embeddings with just SQL. For instance, a table of academic papers can be augmented with CLIP model embeddings produced locally. This entire process - creating, storing, and querying - happens using just SQL."
SELECT abstract, introduction, figure1, clip_text(abstract) AS abstract_ai, clip_text(introduction) AS introduction_ai, clip_image(figure1) AS figure1_ai INTO papers_augmented FROM papers; SELECT abstract, introduction FROM papers_augmented ORDER BY abstract_ai <-> clip_text("The Foundation of the General Theory of Relativity") LIMIT 10; - 196 points
- Is this documented anywhere? Ran docker and one of my endpoints inserts rows to a google sheets via its API...
- Also, the main reason I did not initially add a lot of info to the story is I do not know what is relevant and what is not because the email I received from them contains even less detail than my post. That is the only thing I really wish changes in the future. Would totally use Hetzner again with that (if I am allowed to, that is)
- Added details here[1]. For clarity, I'd just add that I was also running my backend on the instance which could have been buggy and compromised (cannot rule that out). Was not mining crypto, hosting porn or doing anything else that I'd think is clearly against their terms of service.
- This was the first email I received, after my welcome email on July 15.
This was a dev instance, maybe I was hacked, cannot rule that out. But was not expecting an outright ban with no details on what has happened.
- OP Here.
I have a single configured email address on which I received my welcome email on July 15 and "Server Locking" email today.
Looking into Hetzner dashboard, it seems they did not delete my instance, just turned it off and banned my IP so I cannot ssh into it. There is an option to request unblocking which I will request soon and which wants me to answer "What caused this problem?" and "How do you plan to correct this problem and prevent for the future?".
This was a development instance: running docker, postgres, SchemaSpy, some service emulators, node, vscode and accessed the services through ssh port-forwarding.
It seems there is an "Abuse" incident linked to the blocking of my IP but I only see the incident ID, no additional details.
This was a dev instance, I did not think about making it airtight. I do not rule out that someone broke into it and violated their terms (this happens with production systems and I am definitely a worse engineer than people there). If this happened, I am happy they locked it down but I wish they informed their users in these cases: I had git ssh keys and other secrets there which I proactively revoked and more information on the incident would definitely have helped choose the right course of action.
- 73 points
- My first thought was that some VM virtualization repo got 100k stars on GitHub :))
Would be curious to know what the underlying aws ec2 instance is.
Is each DB on a dedicated instance?
If not, are there per-customer iops bounds?