That just gave me an idea! I wonder how useful (and for what) a model would be if it was trained using a two-phase approach:
1) Put the training data through an embedding model to create a giant vector index of the entire Internet.
2) Train a transformer LLM but instead only utilising its weights, it can also do lookups against the index.
Its like a MoE where one (or more) of the experts is a fuzzy google search.
The best thing is that adding up-to-date knowledge won’t require retraining the entire model!
This item has no comments currently.
It looks like you have JavaScript disabled. This web app requires that JavaScript is enabled.
Please enable JavaScript to use this site (or just go read Hacker News).
That just gave me an idea! I wonder how useful (and for what) a model would be if it was trained using a two-phase approach:
1) Put the training data through an embedding model to create a giant vector index of the entire Internet.
2) Train a transformer LLM but instead only utilising its weights, it can also do lookups against the index.
Its like a MoE where one (or more) of the experts is a fuzzy google search.
The best thing is that adding up-to-date knowledge won’t require retraining the entire model!