Preferences

Full text agentic retrieval. Instead of cosine similarity on vectors, parsing metadata through an agentic loop.

To give a real world example, the way Claude Code works versus how Cursor's embedded database works.


How do you do that on 5 million documents?
People are usually not querying across 5 million documents in a single scope.

If you want something as simple as "suggest similar tweets" or something across millions of things then embeddings still work.

But if you want something like "compare the documents across these three projects" then you would use full text metadata extraction. Keywords, summaries, table of contents, etc to determine data about each document and each chunk.

This item has no comments currently.

Keyboard Shortcuts

Story Lists

j
Next story
k
Previous story
Shift+j
Last story
Shift+k
First story
o Enter
Go to story URL
c
Go to comments
u
Go to author

Navigation

Shift+t
Go to top stories
Shift+n
Go to new stories
Shift+b
Go to best stories
Shift+a
Go to Ask HN
Shift+s
Go to Show HN

Miscellaneous

?
Show this modal