Preferences

What's the strength of play for the GPT architecture? It's impressive that it figures out the rules, but does it play strong chess?

>> As they say, attention may indeed be all you need.

I don't think drawing general conclusions about intelligence from a board game is warranted. We didn't evolve to play chess or Go.


> What's the strength of play for the GPT architecture?

Pretty shit for a computer. He says his 50m model reached 1800 Elo (by the way, its Elo and not ELO as the article incorrectly has it, it is named after a Hungarian guy called Elo). It seems to be a bit better than Stockfish level 1 and a bit worse than Stockfish level 2 from the bar graph.

Based on what we know I think its not surprising these models can learn to play chess, but they get absolutely smoked by a "real" chess bot like Stockfish or Leela.

Afaik his small bot reaches 1300 and gpt-3.5-instruct reaches 1800. We have no idea how much and on what kind of PGNs the Openai model was trained. I heard a rumor that they specifically trained on games up to 1800 before but no idea.
They also say “I left one training for a few more days and it reached 1500 ELO.” I find it quite likely the observed performance is largely limited by the spent compute.
I can't see it being superhuman, that's for sure. Chess AI are superhuman because they do vast searches, and I can't see that being replicated by an LLM architecture.
The apples-to-apples comparison would be comparing an LLM with Leela with search turned off (only using a single board state)

According to figure 6b [0] removing MCTS reduces Elo by about 40%, scaling 1800 Elo by 5/3 gives us 3000 Elo which would be superhuman but not as good as e.g. LeelaZero.

[0]: https://gwern.net/doc/reinforcement-learning/model/alphago/2...

Leela policy is around 2600 elo, or around the level of a strong grandmaster. Note that Go is different from chess since there are no draws, so skill difference is greatly magnified. Elo is always a relative scale (expected score is based on elo difference) so multiplication should not really make sense anyways.
I don’t think 3000 is superhuman though, it’s peak human as iirc magnus had an Elo of 3000 at one point
Any particular reason why that shouldn't work well with fine-tuning of an LLM using reinforcement learning?
Chess AI used to dominate by computational power but to my knowledge that is no longer true and the engines beat all but the very strongest players even when run on phone CPUs.
Phone cpus have gotten quite fast in the past decade, too.
Deep Blue analyzed some 200 million positions per second. Modern engines analyze a three to four orders of magnitude fewer nodes per second, but have much more refined pruning of the search space.
Point taken.

Positions analyzed per $, per W and per watt-dollar are surely much, much higher, though ;)

20 thousand positions per second is still a lot compared to a human though.
> What's the strength of play for the GPT architecture? It's impressive that it figures out the rules, but does it play strong chess?

sometimes it is not a matter of "is it better? is it larger? is it more efficient?", but just a question.

mountains are mountains, men are men.

This item has no comments currently.

Keyboard Shortcuts

Story Lists

j
Next story
k
Previous story
Shift+j
Last story
Shift+k
First story
o Enter
Go to story URL
c
Go to comments
u
Go to author

Navigation

Shift+t
Go to top stories
Shift+n
Go to new stories
Shift+b
Go to best stories
Shift+a
Go to Ask HN
Shift+s
Go to Show HN

Miscellaneous

?
Show this modal