In legacy bbchallenge's seed database, machines were run for exactly 47,176,870 steps, and we were left with about 80M nonhalting candidates. Discrepancy comes from Coq-BB5 not excluding 0RB and other minor factors.
Also, "There are 181 million ish essentially different Turing Machines with 5 states", it is important to note that we end up with this number only after the proof is completed, knowing this number is as hard as solving BB(5).
"essentially different" is not a statically-checked property. It is discovered by the enumeration algorithm (Tree Normal Form, see Section 3 of https://arxiv.org/pdf/2509.12337); in particular, for each machine, the algorithm needs to know it if will ever reach an undefined transition because if it does it will visit the children of that machine (i.e. all ways to set that reached undefined transition).
Knowing if an undefined transition will be ever reached is not computable, hence knowing the number of enumerated machines is not computable in general and is as hard as solving BB(5).
When I first heard it I thought about using some kind of similar symmetry arguments (e.g. swapping left-move and right-move). Maybe there are also more elaborate symmetry arguments of some kind.
Isn't it fair to say that there is no single objective definition of what differences between machines are "essential" here? If you have a stronger theory and stronger tools, you can draw more distinctions between TMs; with a weaker theory and weaker tools, you can draw fewer distinctions.
By analogy, suppose you were looking at groups. As you get a more sophisticated group theory you can understand more reasons or ways that two groups are "the same". I guess there is a natural limit of group isomorphism, but perhaps there are still other things where group structure or behavior is "the same" in some interesting or important ways even between non-isomorphic groups?
Now suppose you are running a Turing machine program from the beginning. The only state it has visited so far is state A. It runs until it reaches a state that has not been visited yet. What state is it? B? C? D? According to "tree normal form", the name for that next state just is the earliest unused state name, which in this case is B.
Visited states so far are A and B. Run until an unvisited state is reached again. What is its name? C? D? E? Tree normal form says that the state will be called C. And the next newly visited state will be D, etc. In general, the canonical form for a Turing machine program will be the one that puts initial state visits in alphabetical order. (This concept also applies to the multi-color case.)
It's not possible to tell of an arbitrary TM program whether or not that program is in tree normal form. But this proof doesn't look at arbitrary programs. It generates candidate programs by tracing out the normal tree starting from the root, thereby bypassing non-normal programs altogether.
That is what "essentially different" means here.
If you count transition tables in which states are reachable and canonically ordered, then you get a count of 632700 * 4^10 = 663434035200 [2]. These machines can behave differently on arbitrary tape contents.
TNF further reduces these counts by examining machine behaviour when starting on an empty tape.
The problem is knowing how many states are actually reachable, and how many are dead code. This is impossible to decide in general thanks to Rice's theorem and whatnot. In this case, it involves deciding all 4-state machines.
Is there some reason why Rocq is the proof assistant of choice and not one of the others?
I did some small proofs in Dafny for a few of the simpler deciders (most of which didn't end up being used).
At the time, I found Dafny's "program extraction" (i.e. transpilation to a "real" programming language) very cumbersome, and produced extremely inefficient code. I think since then, a much improved Rust target has been implemented.
If nothing else, it’ll inspire the next generation of mathematicians.
Yeah, one lesson is that if you relax your halting condition to entering a loop at some point you get much longer runtime, see machine Skelet#1 (https://bbchallenge.org/1LC1LE_---1LD_1RD0LD_1LA1RE_0LB0RC), enters a loop after 10^51 steps and the loop is 10^9 steps long.
Is this suspected or proven? I would love to read the proof if it exists.
No program can say what another will do. Now, I won't just assert that, I'll prove it to you: I will prove that although you might work til you drop, you can't predict whether a program will stop.
Imagine we have a procedure called P that will snoop in the source code of programs to see there aren't infinite loops that go round and around; and P prints the word "Fine!" if no looping is found.
You feed in your code, and the input it needs, and then P takes them both and it studies and reads and computes whether things will all end as the should (as opposed to going loopy the way that they could).
Well, the truth is that P cannot possibly be, because if you wrote it and gave it to me, I could use it to set up a logical bind that would shatter your reason and scramble your mind.
Here's the trick I would use - and it's simple to do. I'd define a procedure - we'll name the thing Q - that would take and program and call P (of course!) to tell if it looped, by reading the source;
And if so, Q would simply print "Loop!" and then stop; but if no, Q would go right back to the top, and start off again, looping endlessly back, til the universe dies and is frozen and black.
And this program called Q wouldn't stay on the shelf; I would run it, and (fiendishly) feed it itself. What behaviour results when I do this with Q? When it reads its own source, just what will it do?
If P warns of loops, Q will print "Loop!" and quit; yet P is supposed to speak truly of it. So if Q's going to quit, then P should say, "Fine!" - which will make Q go back to its very first line!
No matter what P would have done, Q will scoop it: Q uses P's output to make P look stupid. If P gets things right then it lies in its tooth; and if it speaks falsely, it's telling the truth!
I've created a paradox, neat as can be - and simply by using your putative P. When you assumed P you stepped into a snare; Your assumptions have led you right into my lair.
So, how to escape from this logical mess? I don't have to tell you; I'm sure you can guess. By reductio, there cannot possibly be a procedure that acts like the mythical P.
You can never discover mechanical means for predicting the acts of computing machines. It's something that cannot be done. So we users must find our own bugs; our computers are losers!
by Geoffrey K. Pullum Stevenson College University of California Santa Cruz, CA 95064
From Mathematics Magazine, October 2000, pp 319-320.
[See specifics of the pipeline in Table 3 of the linked paper]
* There are 181 million ish essentially different Turing Machines with 5 states, first these were enumerated exhaustively
* Then, each machine was run for about 100 million steps. Of the 181 million, about 25% of them halt within this memany step, including the Champion, which ran for 47,176,870 steps before halting.
* This leaves 140 million machines which run for a long time.
So the question is: do those TMs run FOREVER, or have we just not run them long enough?
The goal of the BB challenge project was to answer this question. There is no universal algorithm that works on all TMs, so instead a series of (semi-)deciders were built. Each decider takes a TM, and (based on some proven heuristic) classifies it as either "definitely runs forever" or "maybe halts".
Four deciders ended up being used:
* Loops: run the TM for a while, and if it re-enters a previously-seen configuration, it definitely has to loop forever. Around 90% of machines do this or halt, so covers most.
6.01 million TMs remain.
* NGram CPS: abstractly simulates each TM, tracking a set of binary "n-grams" that are allowed to appear on each side. Computes an over-approximation of reachable states. If none of those abstract states enter the halting transition, then the original machine cannot halt either.
Covers 6.005 million TMs. Around 7000 TMs remain.
* RepWL: Attempts to derive counting rules that describe TM configurations. The NGram model can't "count", so this catches many machines whose patterns depend on parity. Covers 6557 TMs.
There are only a few hundred TMs left.
* FAR: Attempt to describe each TM's state as a regex/FSM.
* WFAR: like FAR but adds weighted edges, which allows some non-regular languages (like matching parentheses) to be described
* Sporadics: around 13 machines had complicated behavior that none of the previous deciders worked for. So hand-written proofs (later translated into Rocq) were written for these machines.
All of the deciders were eventually written in Rocq, which means that they are coupled with a formally-verified proof that they actually work as intended ("if this function returns True, then the corresponding mathematical TM must actually halt").
Hence, all 5-states TMs have been formally classified as halting or non-halting. The longest running halter is therefore the champion- it was already suspected to be the champion, but this proves that there wasn't any longer-running 5-state TM.