Comment by hitchstory

hitchstory Feb 17, 2025 parent

>But the before-test is strictly negative

I actually did this the other day on a piece of code. I was feeling a bit lazy. I didn't write the test and I figured that making the type checker catch it was enough. I still didn't write a test after either though.

Anecdotally I've always found that tests which cover real life bugs are in the class of test with the highest chance of catching future regressions. So even if it does exist, I'm still mildly skeptical of the idea that tests that catch bugs that compilers have been provoked into also catching are strictly negative.

>And yet I see TDD practitioners as the primary source of such tests

I find the precise opposite to be true. TDD are more likely to tie requirements to tests because they write them directly after getting requirements. Test-after practitioners more likely tie implementation to the test.

It's always possible to write a shit implementation-tied test with TDD, but the person who writes a shit test with TDD will write a shit implementation tied test after too. What did TDD have to do with that? Nothing.

>if you are dogmatically writing a test for every intermediate change, you will end up with lots of extra tests that assert things in order to satisfy the TDD dogma rather than the specific needs of the problem.

I find that this only really happens when you practice TDD with very loose typing. If you practice strict typing, the tests will invariably be narrowed down to ones which address the specific needs of the problem.

Again - without TDD and writing the test after, loose typing is still a shit show. So, I see this as another issue which is about something separate to TDD.

>Obviously this can be avoided with judgement - but if you have sound independent judgement you don't need to adhere to specific philosophies

I think this is conflating "TDD is a panacea" with "if it is valuable to write a test, it's always better to write it before". I've never thought the former, but the examples you've listed here look to me only like examples of where TDD didn't save somebody from making a mistake that was about a separate issue (types, poor quality test). None of them are examples of "actually writing the test after would have been better".

>When implementing to a spec you are absolutely right, but a very small amount of software is completely or even mostly specified in advance.

Why on earth would you do that? If I write even a single line of production code I have specified what that line of code is going to do. I have watched juniors flail around and do this when getting vague specs but seniors generally try to nail down a user story tight with a combination of code investigation, spiking and dialog with stakeholders before writing code that they would otherwise have to toss in the trash can if it wasn't fit for purpose.

To me this isn't related to TDD either. Whether or not I practice TDD, I don't fuck around writing or changing production code if I don't know precisely what result it is I want to achieve. Ever.

Future requirements will probably remain vague but never the ones I'm implementing right now.

>I agree this can lead to brittle tests and lack of spec adherence, but if you are iterating on intermediate state and writing tests as you go, the structure of the code you wrote 30 seconds ago is very much influencing the test you're writing now.

Only if the spec is changing too. This sometimes happens if I discover some issue by looking at the code, but in general my test remains relatively static while the code underneath it iterates.

This obviously wouldn't happen if you wrote implementation-tied tests rather than specification-tied tests but... maybe just don't do that?

>Another issue is that fault injection tests basically require coupling to the implementation

All tests requiring coupling to implementation in some way. The goal is to loosely couple as possible while maximizing speed, ease of use, etc. I'm not really sure why fault injection should be treated as special. If you need to refactor the test harness to allow it, that's probably a really good idea.

>The way I prefer to write these is to write the implementation first, then write the fuzz test - add a few bugs in the implementation, and fix/enhance the fuzz test until it catches them. Fuzz testing is one of the best bang-for-buck testing

Fuzz testing is great and having preferences is fine, but once again fuzz testing says little about the efficacy of TDD (fuzz tests can be written both before and after) and preference for test-after I find tends to mean little more than "I like my old habits".

>Fuzz testing is one of the best bang-for-buck testing methodologies there is, and in my experience it's very hard to write a really good fuzz test unless you already have most of your implementation

In my experience you can (I've done TDD with property tests and, well, I see fuzz testing as simply a subset of that). I also don't see any particular reason why you can't.

These methodologies I find do provide bang for the buck if you're writing a very specific kind of code. I will know if I'm writing that type of code in advance.

>This change alters a lock-free data structure to add a monotonicity invariant,

If I'm reading it correctly, this looks like a class of bugs we discussed that is fixed by tightening up the typing. In which case, no test is strictly necessary, although I'd argue that it probably would not hurt either.

>This change moves a memory layout - again, I don't know how I would have written a test for this, besides something wild like querying smaps (not portable) to see if the final page of the arena allocation had faulted in.

I can't tell if this is refactoring or you're fixing a bug. Is there a scenario which would reproduce a bug? If so, quite possibly a test would help. I've rarely been terribly sympathetic to the view that "writing a test to replicate this bug is too hard" is a good reason for not doing it. Programming is hard. I find that A) bugs often tend to cluster in scenarios that the testing infrastructure is ill equipped to reproduce the scenario B) once you upgrade the testing infrastructure to handle those scenario types those bugs often poof...stop recurring.

>This change was written more in the way you recommend - but the whole change is basically a test. I debugged this by reading the code and thinking about it, then wrote up a pretty complicated fuzz test to help find any future races. I'm guessing that you would not consider adding debug asserts to be a violation of "write the test first"

I would file that under "tightening up typing" again and also file it under "the decision to write a test at all is distinct from the decision to write a test first".

>I don't think TDD would have led to better results in these cases

Again, I see no examples where writing a test after would have been better. There are just a few where you could argue that not writing a test at all is the correct course of action.

charleslmunger Feb 17, 2025

>Anecdotally I've always found that tests which cover real life bugs are in the class of test with the highest chance of catching future regressions. So even if it does exist, I'm still mildly skeptical of the idea that tests that catch bugs that compilers have been provoked into also catching are strictly negative.

Maybe this is a static typing thing - if the test won't build because you've made the bug inexpressible in the type system, what test can you even write?

>If I write even a single line of production code I have specified what that line of code is going to do.

>combination of code investigation, spiking and dialog with stakeholders

Does "spiking" in this context mean writing code without TDD?

>I think this is conflating "TDD is a panacea" with "if it is valuable to write a test, it's always better to write it before".

That's a weaker formulation of TDD than I've seen espoused and practiced, which is usually something more like "before you make any behavioral changes to the code under test, you must write a test that fails; then make your edit so the test passes, and repeat". The problem with your approach at least for me is that until I've messed around a bit in the code seeing what's the best approach is to solving a problem, I don't know what the best structure for a final test is, or whether I can make the change in a way that leverages existing tests to detect the bug.

> I'm not really sure why fault injection should be treated as special.

Suppose you write a test reproducing a bug that reacts to allocation failing. One common way to do that is to inject an allocator that fails on the specific allocation call you had a bug with. But how do you target that specific call? One way is to make the Nth allocation in a test fail - but even slight changes to the production code will make this test start testing something completely different. The solution is to have a fuzz test that injects failed allocations randomly per run - now you can be reasonably confident that even if the prod code changes over time, your fuzz test will still all the allocation sites, preventing regression. I don't see why it's beneficial to write this first or last. Under your model the instrumented allocator setup is a replacement for the single test case that reproduces the original bug.

>If I'm reading it correctly, this looks like a class of bugs we discussed that is fixed by tightening up the typing. In which case, no test is strictly necessary, although I'd argue that it probably would not hurt either.

No, the original "bug" is that if you have two calls to SpaceAllocated and the second one races with a Fuse call on a other thread, you can see a smaller value result than before. This behavior wasn't guaranteed by the public API (so not a bug) but it's desirable to add. The fix is replacing a singly linked list with a doubly linked list, using tagged pointers to avoid adding storage cost for the second set of links. A test could be written for this; I could inspect the allocated addresses of three arenas and fuse them in a specific order, but I could not actually reproduce the ordering required without a test harness that allows full control of thread execution interleaving - which would be a huge amount of work and in my opinion not actually prevent future bugs, since that interleaving only meaningfully exists in the previous implementation. The full set of possible interleavings is way too large to productively explore. If I had started by writing the test, I would have spent a bunch of time messing around before giving up; because I did the implementation first, I had a much more informed idea of what would be required to test it, and changed my plan for preventing future bugs to documentation.

>I can't tell if this is refactoring or you're fixing a bug. Is there a scenario which would reproduce a bug?

In this case the "bug" is not really a bug - if we are allocating an arena, we put the overhead at the start rather than end of the first memory block. The reason we're doing this is that while it's totally valid to use the provided memory however we want, we can avoid faulting an extra page if we store the overhead near where we're about to allocate from. So the test would have to test "on a virtual memory platform, do we write to memory in a way that maximizes spatial locality". This is possible to do but it's a total mess of a test, and I have no idea how to do it on Windows. More importantly, anyone who changes this code is going to be doing it on purpose, and we don't guarantee to callers where we'll be placing any pointers we return.

>Again, I see no examples where writing a test after would have been better.

If TDD produces better results for you, that's great. I think I made a case with real examples where using TDD would have been at best neutral, but in practice would have made me spend more time to get the same results.

This item has no comments currently.

Preferences

Keyboard Shortcuts

Story Lists

Navigation

Miscellaneous