- Quick running tests, detailed assertions —> PBT
- Longer tests, just looking for a crash —> fuzzing.
- In between, who knows?
https://hypothesis.works/articles/what-is-property-based-tes...
Under this point of view, a property-based testing library is really two parts:
1. A fuzzer.
2. A library of tools for making it easy to construct property-based tests using that fuzzer.
What should I (we?) be building on top of Go's fuzzer to make it easier to construct property-based tests?My current strategy is to interpret a byte stream as a sequence of "commands", let those be transitions that drive my "state machine", then test all invariants I can think of at every step.
What you describe (commands) is commonly used with model based testing and distributed systems.
> Python is Turing complete, but does [TLA,] need to be? Is there an in-Python syntax that can be expanded in place by tooling for pretty diffs; How much overlap between existing runtime check DbC decorators and these modeling primitives and feature extraction transforms should there be? (In order to: minimize cognitive overload for human review; sufficiently describe the domains, ranges, complexity costs, inconstant timings, and the necessary and also the possible outcomes given concurrency,)
From "S2n-TLS – A C99 implementation of the TLS/SSL protocol" https://www.hackerneue.com/item?id=38510025 :
> But formal methods (and TLA+ for distributed computation) don't eliminate side channels. [in CPUs e.g. with branch prediction, GPUs, TPUs/NPUs, Hypervisors, OS schedulers, IPC,]
Still though, coverage-based fuzzing;
From https://www.hackerneue.com/item?id=30786239 :
> OSS-Fuzz runs CloudFuzz[Lite?] for many open source repos and feeds OSV OpenSSF Vulnerability Format: https://github.com/google/osv#current-data-sources
From "Automated Unit Test Improvement using Large Language Models at Meta" https://www.hackerneue.com/item?id=39416628 :
> "Fuzz target generation using LLMs" (2023), OSSF//fuzz-introspector*
> gh topic: https://github.com/topics/coverage-guided-fuzzing
The Fuzzing computational task is similar to the Genetic Algorithm computational task, in that both explore combinatorial Hilbert spaces of potentially infinite degree and thus there is need for parallelism and thus there is need for partitioning for distributed computation. (But there is no computational oracle to predict that any particular sequence of combinations of inputs under test will deterministically halt on any of the distributed workers, so second-order methods like gradient descent help to skip over apparently desolate territory when the error hasn't changed in awhile)
The Fuzzing computational task: partition the set of all combinations of inputs for distributed execution with execute-once or consensus to resolve redundant results.
DbC Design-By-Contract patterns include Preconditions and Postconditions (which include tests of Invariance)
We test Preconditions to exclude Inputs that do not meet the specified Ranges, and we verify the Ranges of Outputs in Postconditions.
We test Invariance to verify that there haven't been side-effects in other scopes; that variables and their attributes haven't changed after the function - the Command - returns.
DbC: https://en.wikipedia.org/wiki/Design_by_contract :
> Design by contract has its roots in work on formal verification, formal specification and Hoare logic.
TLA+ > Language: https://en.wikipedia.org/wiki/TLA%2B#Language
Formal verification: https://en.wikipedia.org/wiki/Formal_verification
From https://www.hackerneue.com/item?id=38138319 :
> Property testing: https://en.wikipedia.org/wiki/Property_testing
> awesome-python-testing#property-based-testing: https://github.com/cleder/awesome-python-testing#property-ba...
> Fuzzing: https://en.wikipedia.org/wiki/Fuzzing
Software testing > Categorization > [..., Property testing, Metamorphic testing] https://en.wikipedia.org/wiki/Software_testing#Categorizatio...
--
From https://www.hackerneue.com/item?id=28494885#28513982 :
> https://github.com/dafny-lang/dafny #read-more
> Dafny Cheat Sheet: https://docs.google.com/document/d/1kz5_yqzhrEyXII96eCF1YoHZ...
> Looks like there's a Haskell-to-Dafny converter.
haskell2dafny: https://gitlab.doc.ic.ac.uk/dcw/haskell-subset-to-dafny-tran...
--
Controlled randomness: tests of randomness, random uniform not random norm, rngd, tests of randomness:
From https://www.hackerneue.com/item?id=40630177 :
> google/paranoid_crypto.lib.randomness_tests: https://github.com/google/paranoid_crypto/tree/main/paranoid... docs: https://github.com/google/paranoid_crypto/blob/main/docs/ran...
What's the intent behind building on top of Go's fuzzer as opposed to an approach like testing/quick takes?
Also: https://www.hackerneue.com/item?id=30385195
But, it occurs to me I could try and build on the same API, and compare them for effectiveness.
Indeed. And respect to software that knows when it is complete, but is there some relevance to this here?
> Also: https://www.hackerneue.com/item?id=30385195
While the distinction between fuzzing and PBT is, er, fuzzy, what is unquestionable is that PBT is not meant to be coverage driven. It is a means to document behaviour for other developers in a way that is, in some cases, more descriptive than providing a fixed set of examples or other "traditional" ways of providing such documentation. Seeking coverage and "interesting" cases, rather than providing documentation, is decidedly the role of fuzzing.
Traditional property based testing is implemented simply as a library, so they don't necessarily have coverage information to guide their random input generation.
However, depending on the lib, you can get some nice quality of life improvements. One "nice to have" is shrinking. See the "Shrinking" section here https://tech.fpcomplete.com/blog/quickcheck-hedgehog-validit...
Having combinators to compose generators is also great.
Libs may also have a known set of "bad" values that cause exceptional behavior.
Not sure how effective it is compared to other options, but I'm not totally missing out there.
It also builds a corpus of interesting inputs over time (those that cause new branches to be taken, since that's its goal: improve coverage).
I'm less sure about combinators.
But I'd take one step back, and ask a more meta question about testing: does a successful test mean successful code, and vice-versa? Is there anything in Go's contract that specifies that the same inputs to the same code will yield the same output?
A successful test implies that the documentation is true. That is, after all, why you are writing tests: Documentation – To explain to future developers the intent and usage behind the code you wrote. You could just as well write the documentation in Word instead, but then you'd lose the ability to perform machine verification of the documentation, which is a huge boon to moving codebases where the documentation and code are likely to fall out of sync if one is not incredibly careful (although perhaps LLMs will some day bridge that gap?).
The documentation being true does not imply that the documentation is useful, of course. Writing useful documentation is still an art.
A few jobs ago I would often be in a team where the entire team had turned over several times, we would be asked to do large updates to a legacy application or bump lots of dependencies and just not break anything. When pushed the product owner wouldn’t be able to describe what the app was supposed to do, particularly for unusual types of users (the account / user modelling was chaotic so say, several billing accounts per user, each with different products and access levels). At that point “foo calls bar” doesn’t clarify much intent.
I still think the usual combinators (map, filter, chain, and oneOf, to use fast-check’s names) can be a bit awkward, so I’m writing a new property testing library for JavaScript that hopefully works a bit nicer. (Experimental and not published yet.)
I find when I do this I don’t need to write by hand anywhere near as many tests to get my software working well. I also usually turn any failures found by the fuzzer into standalone unit tests, to make any regressions easier to find later.
I coded some tests that ensure the data structure is useful, many of them test examples from papers describing the data structure, but that don't necessarily cover all the corner cases.
Then I fuzzed it. I used Go's fuzzer, which is geared towards parsers and stuff. It can generate a stream of bytes and use that for fuzzing. The data structure is a set/map. So I interpret the stream of bytes as commands to add/remove random elements from the set/map. After I add an element, contains needs to return true; after I remove one, contains need to return false; if I put in a mapping, finding it must return what I just put there, etc. And at every step all the data structure invariants (that ensure logarithmic search, etc) need to hold.
That was stupidly effective at finding a few bugs, all within seconds, all with sequences of less than a dozen operations. Then it stops. And you get 100% coverage.
I'm assuming that, apart from ergonomics, where I kinda build my own state machine transitions out of a stream of bytes, the tooling actually seems more effective than property testing libraries.
Still curious to understand what I'm missing out.
For example, let's say our test data is an assoc-list like `[(Int, Customer)]`, and there's a bug that's triggered by repeated keys. LazySmallcheck will run the test on the most general assoc-list there is, a _|_ value!
myTest (exception A) --> throws A
Since A was thrown, LSC will try again; making the list slightly more specific; which means trying both constructors `[]` and `:`: myTest [] --> test passes
myTest (exception A:exception B) --> throws A
The input `[]` is completely specified, and it didn't lead to any counterexample, so LSC is finished with that branch. In the `:` branch, the general pair `exception A` caused an error, so LSC will make it slightly more specific and try again: myTest ((exception A, exception C):exception B) --> throws A
Since `exception A` is an Int, LSC can't use laziness to keep things general; it has to try concrete values, so it proceeds like Smallcheck; trying the "smallest" Int values: myTest ((0, exception C):exception B) --> throws B
Again, LSC will make the general list `exception B` more specific: myTest ((0, exception C):[]) --> passes
myTest ((0, exception C):exception A:exception B) --> throws A
Since the input `(0, exception C):[]` passed the test without triggering the exception, we know the second element of that pair must be irrelevant to the result. In other words, LSC has proved the property holds for inputs `(0, X):[]` for all X!LSC hence won't try specialising `exception C`; but it will try "larger" Int values, e.g. `(1, exception C):[]`, `(-1, exception C):[]`, etc. up to a pre-defined "depth". I'll ignore these cases and focus on the branch with the longer list:
myTest ((0, exception C):(exception A, exception D):exception B --> throws A
Again, LSC will specialise the general Int `exception A` into specific values, starting with the "smallest": myTest ((0, exception C):(0, exception D):exception B) --> FAIL!
This input has duplicate keys, which triggered the bug and caused our test to fail. Again, the fact that the exceptions inside this value didn't get thrown means those parts were irrelevant to the result. LSC indicates irrelevant values with `_`, so it will report this counterexample as `(0, _):(0, _):_`.That's more informative for understanding this bug than e.g. Smallcheck or QuickCheck, whose "minimal" counterexamples would be fully-specified values, like `(0, Customer { name = "", orders = [], discount = Nothing, joined = 1970-01-01 }):(0, Customer { name = "", orders = [], discount = Nothing, joined = 1970-01-01 }):[]`.
Hence LSC's approach has three interesting features:
- It has a coverage-like approach that homes-in on interesting cases, by gradually generating the parts of a value that get forced (which are usually the parts being branched on)
- It can indicate which parts of a counterexample are irrelevant for the failure
- It can prove some properties universally (i.e. when that input is irrelevant to the pass result)
https://www.tedinski.com/2018/12/11/fuzzing-and-property-tes...
Like, with the below fuzz test, and the corresponding invariant checks, isn't this all but equivalent to property tests?
https://github.com/ncruces/aa/blob/505cbbf94973042cc7af4d6be...
https://github.com/ncruces/aa/blob/505cbbf94973042cc7af4d6be...