- amw-zero parentSuper interesting, but I think this will be very difficult in practice due to the gigantic effect of nondeterminism at the hardware level (caches, branch prediction, out of order execution, etc.)
- I think this is less about guarantees and more about understanding behavioral characteristics in response to different loads.
I personally could care less about proving that an endpoint always responds in less than 100ms say, but I care very much about understanding where various saturation points are in my systems, or what values I should set for limits like database connections, or how what the effect of sporadic timeouts are, etc. I think that's more the point of this post (which you see him talk about in other posts on his blog).
- This is the single most impactful blog post I've read in the last 2-3 years. It's so obvious in retrospect, but it really drove the point home for me that functional correctness is only the beginning. I personally had been over-indexing on functional correctness, which is understandable since a reliable but incorrect system isn't valuable.
But, in practice, I've spent just as much time on issues introduced by perf / scalability limitations. And the post thesis is correct: we don't have great tools for reasoning about this. This has been pretty much all I've been thinking about recently.
- Compare the spec with the application here: https://concerningquality.com/model-based-testing/
I think we've become used to the complexity in typical web applications, but there's a difference between familiar and simple (simple vs. easy, as it were). The behavior of most business software can be very simply expressed using simple data structures (sets, lists, maps) and simple logic.
No matter how much we simply it, via frameworks and libraries or whatever have you, things like serialization, persistence, asynchrony, concurrency, and performance end up complicating the implementation. Comparing this against a simpler spec is quite nice in practice - and a huge benefit is now you can consult a simple in-memory spec vs. worrying about distributed system deployments.
- Research points to there being a quadratic relationship between automated proof and code size: https://trustworthy.systems/publications/nictaabstracts/Mati....
Specifically, the relationship is between the _specification_ and the proof, and it was done for proofs written in Isabelle and not Lean.
The good news is that more and more automation is possible for proofs, so the effort to produce each proof line will likely go down over time. Still, the largest full program we've fully verified is much less than 100,000 LOC. seL4 (verified operating system) is around 10,000 lines IIRC.
- 2 points
- There are 2 main problems in generative testing:
- Input data generation (how do you explore enough of the program's behavior to have confidence that you're test is a good proxy for total correctness)
- Correctness statements (how do you express whether or not the program is correct for an arbitrary input)
When you are porting a program, you have a built in correctness statement: The port should behave exactly as the source program does. This greatly simplifies the testing process.
- How do table access methods work with standard PG concepts such as the shared_buffers cache? Is that irrelevant since data is stored in FDB? Also, how do the deployment semantics of FDB affect this. If I remember correctly, you typically run one FDB process per CPU on a machine. How do PG processes map to those?