Ah, a Google search for 'limerick "finesse" "success"' had a few matching ones on the first page.
Tokens solve the following problem: the first layers of the neural net(s) are a big array of numbers, where the size is a hyperparameter (a parameter that can be used to change the size or behavior of the network). A typical size for a big model like the GPTs is larger than 50,000. You have to somehow encode language as a sequence of assignments to this array of numbers. How do you do it?
The first and most obvious idea is characters. You could assign each unicode code point to one of the slots in the input array, and then use what they call a one-hot encoding where every number is zero except the character. You can do this, but it's not very efficient because virtually all the training text is written in Latin languages and so almost all the slots will be unused.
A better way is to start with a big pile of text that you're going to use, and then iteratively assign the slots to sequences of characters based on how common they are. This makes it easier for the network to learn and reason. These sequences are tokens. For example "and" is very common, and so would "ing" (from the suffix), so those should get their own tokens. SuperGoldMagiCarp isn't so that really shouldn't - it should be represented as a sequence of tokens instead. There are algorithms that figure out the most efficient assignment of tokens to character sequences, which you can think of as the model's vocabulary, and then to convert text into a sequence of these token numbers. OpenAI's software is called Tiktoken and is written in Rust for speed.
The output of the network is likewise also tokens, and so you run the process in reverse at the end to get text back out of the array of floats that the network produces. Or more accurately, the network produces a set of probabilities for each token in its vocabulary, and then you can pick the most probable (oversimplification - in reality the way you select the token is more complicated than that as otherwise you get bad results).
The problems here are to do with bugs in the training process, but are no less interesting for that. Some character sequences that are very rare and should really be represented by many different tokens in a row have somehow ended up being considered important enough to be given their own whole token. The most common cause of this seems to be cases where the token vocabulary was computed on text that contained garbage highly repetitive text like debug logs from video games, hence the prevalence of obscure game characters like Leilan, and Reddit threads are clearly over-represented. But then GPT struggles to work out what these tokens actually mean because they hardly appears in the training set, and so these tokens seem to float together in space and get easily conflated inside the model, also they end up representing very vague or abstract concepts.