Made a fun little experiment this evening. You can watch a tiny transformer (Karpathy's NanoGPT) model learning features and structure of English language in just over 12 minutes in real time from the tiny_shakespeare dataset.
You can see how it slowly picks up more and more coherence over 5000 training steps.
You can see how it slowly picks up more and more coherence over 5000 training steps.