Anything that operates by checking the implementation can find implementation bugs. Even if they find design bugs it's too late.
As an analogy. You can document the code with comments. Comments explaining why you do what you do. This along with the code can be reviewed and be part of the revision control.
That said, do you still recommend writing separate design documents? If so why? This will also answer your question on why we will need a design spec that's separate from the implementation
(TODO: link to post about university-level FM programs)
Always wondered whether there's some advantage to translating to another language's AST for static and/or dynamic analysis or not. AFAIU Fuzzing is still necessary even given Formal Implementation?
Python is Turing complete, but does [TLA,] need to be? Is there an in-Python syntax that can be expanded in place by tooling for pretty diffs; How much overlap between existing runtime check DbC decorators and these modeling primitives and feature extraction transforms should there be? (In order to: minimize cognitive overload for human review; sufficiently describe the domains, ranges, complexity costs, inconstant timings, and the necessary and also the possible outcomes given concurrency,)
"FaCT: A DSL for Timing-Sensitive Computation" and side channels https://www.hackerneue.com/item?id=38527663
Is there still a place for implementation language agnostic tools for formal methods?
icontract and pycontracts do runtime type checking and value constraints with DbC Design-by-Contract patterns that more clearly separate the preconditions and postconditions from the command, per Hoare logic. Both can read Python type annotations and apply them at runtime or not, both support runtime checking of invariants: values that shouldn't have changed when postconditions are evaluated after the function returns, but could have due to e.g. parallelism and concurrency and (network I/O) latency and pass by reference in distributed systems.