Edited: I found this to be useful for explaining maximum entropy https://awjuliani.medium.com/maximum-entropy-policies-in-rei...
I think that in chess you take a piece and that increases the value but you have to consider the position (that is how your pieces can move) and that is the entropy. So maximum entropy is taking pieces but considering strategic position (policy). But there must be a confluence term, that is how well having many players or new states is a good thing to have. Don't know how to math relate that "confluence" term to entropy. From a computer point of view having a huge number of states makes computation of best move impossible but at the same time can make the optimum larger, so it is related to how given the computer power the algorithm can approximate a maximum that is an increasing function of the number of states. There must be a trade off here that I called confluence.
Also thanks for all explanations.
It's like saying, "Hey robot, remember what you learned last time? Don't forget it completely, but feel free to adjust a bit."
Great article, if you're interested: https://huyenchip.com/2023/05/02/rlhf.html#3_2_finetuning_us...
So clearly it's possible to get lond correlations Right even without RL.
That's typically a setup where RL is desirable (even necessary): we have sparse rewards (only at the end) and give no details to the model on how to reach the solution. It's similar to training models to play chess against a specific opponent.