Comment by tough - Hacker Neue

to someone who would like to study/learn that evolution, any good recs?

jasonjmcghee 2 days ago

This skips over Bag of Words / N-Gram / TF-IDF and many other things, but paints a reasonable picture of the progression.

1. https://jalammar.github.io/illustrated-word2vec/

2. https://jalammar.github.io/visualizing-neural-machine-transl...

3. https://jalammar.github.io/illustrated-transformer/

4. https://jalammar.github.io/illustrated-bert/

5. https://jalammar.github.io/illustrated-gpt2/

And from there it's mostly work on improving optimization (both at training and inference time), training techniques (many stages), data (quality and modality), and scale.

---

There's also state space models, but don't believe they've gone mainstream yet.

https://newsletter.maartengrootendorst.com/p/a-visual-guide-...

And diffusion models - but I'm struggling to find a good resource so https://ml-gsai.github.io/LLaDA-demo/

---

All this being said- many tasks are solved very well using a linear model and tfidf. And are actually interpretable.

oersted 2 days ago

This is indeed the previous generation, but it's not even that old. When I was coming out of undergrad word2vec was the brand-new thing that was eating-up the whole field.

Indeed, before that there was a lot of work on applying classical ML classifiers (Naive Bayes, Decision Trees, SVM, Logistic Regression...) and clustering algorithms (fancily referred to as unsupervised ML) to bag-of-words vectors. This was a big field, with some overlap with Information Retrieval, lending to fancier weightings and normalizations of bag-of-words vectors (TF-IDF, BM25). There was also the whole field of Topic Modeling.

Before that there was a ton of statistical NLP modeling (Markov chains and such), primarily focused around machine translation before neural-networks got good enough (like the early version of Google Translate).

And before that there were a few decades of research on grammars (starting with Chomsky), with a lot of overlap with compilers, theoretical CS (state-machines and such) and symbolic AI (lisps, logic programming, expert systems...).

I myself don't have a very clear picture of all of this. I learned some in undergrad and read a few ancient NLP books (60s - 90s) out of curiosity. I started around the time where NLP, and AI in general, had been rather stagnant for a decade or two, it was rather boring and niche, believe it or not, but was starting to be revitalized by the new wave of ML and then word2vec with DNNs.

This item has no comments currently.