- >But the before-test is strictly negative
I actually did this the other day on a piece of code. I was feeling a bit lazy. I didn't write the test and I figured that making the type checker catch it was enough. I still didn't write a test after either though.
Anecdotally I've always found that tests which cover real life bugs are in the class of test with the highest chance of catching future regressions. So even if it does exist, I'm still mildly skeptical of the idea that tests that catch bugs that compilers have been provoked into also catching are strictly negative.
>And yet I see TDD practitioners as the primary source of such tests
I find the precise opposite to be true. TDD are more likely to tie requirements to tests because they write them directly after getting requirements. Test-after practitioners more likely tie implementation to the test.
It's always possible to write a shit implementation-tied test with TDD, but the person who writes a shit test with TDD will write a shit implementation tied test after too. What did TDD have to do with that? Nothing.
>if you are dogmatically writing a test for every intermediate change, you will end up with lots of extra tests that assert things in order to satisfy the TDD dogma rather than the specific needs of the problem.
I find that this only really happens when you practice TDD with very loose typing. If you practice strict typing, the tests will invariably be narrowed down to ones which address the specific needs of the problem.
Again - without TDD and writing the test after, loose typing is still a shit show. So, I see this as another issue which is about something separate to TDD.
>Obviously this can be avoided with judgement - but if you have sound independent judgement you don't need to adhere to specific philosophies
I think this is conflating "TDD is a panacea" with "if it is valuable to write a test, it's always better to write it before". I've never thought the former, but the examples you've listed here look to me only like examples of where TDD didn't save somebody from making a mistake that was about a separate issue (types, poor quality test). None of them are examples of "actually writing the test after would have been better".
>When implementing to a spec you are absolutely right, but a very small amount of software is completely or even mostly specified in advance.
Why on earth would you do that? If I write even a single line of production code I have specified what that line of code is going to do. I have watched juniors flail around and do this when getting vague specs but seniors generally try to nail down a user story tight with a combination of code investigation, spiking and dialog with stakeholders before writing code that they would otherwise have to toss in the trash can if it wasn't fit for purpose.
To me this isn't related to TDD either. Whether or not I practice TDD, I don't fuck around writing or changing production code if I don't know precisely what result it is I want to achieve. Ever.
Future requirements will probably remain vague but never the ones I'm implementing right now.
>I agree this can lead to brittle tests and lack of spec adherence, but if you are iterating on intermediate state and writing tests as you go, the structure of the code you wrote 30 seconds ago is very much influencing the test you're writing now.
Only if the spec is changing too. This sometimes happens if I discover some issue by looking at the code, but in general my test remains relatively static while the code underneath it iterates.
This obviously wouldn't happen if you wrote implementation-tied tests rather than specification-tied tests but... maybe just don't do that?
>Another issue is that fault injection tests basically require coupling to the implementation
All tests requiring coupling to implementation in some way. The goal is to loosely couple as possible while maximizing speed, ease of use, etc. I'm not really sure why fault injection should be treated as special. If you need to refactor the test harness to allow it, that's probably a really good idea.
>The way I prefer to write these is to write the implementation first, then write the fuzz test - add a few bugs in the implementation, and fix/enhance the fuzz test until it catches them. Fuzz testing is one of the best bang-for-buck testing
Fuzz testing is great and having preferences is fine, but once again fuzz testing says little about the efficacy of TDD (fuzz tests can be written both before and after) and preference for test-after I find tends to mean little more than "I like my old habits".
>Fuzz testing is one of the best bang-for-buck testing methodologies there is, and in my experience it's very hard to write a really good fuzz test unless you already have most of your implementation
In my experience you can (I've done TDD with property tests and, well, I see fuzz testing as simply a subset of that). I also don't see any particular reason why you can't.
These methodologies I find do provide bang for the buck if you're writing a very specific kind of code. I will know if I'm writing that type of code in advance.
>This change alters a lock-free data structure to add a monotonicity invariant,
If I'm reading it correctly, this looks like a class of bugs we discussed that is fixed by tightening up the typing. In which case, no test is strictly necessary, although I'd argue that it probably would not hurt either.
>This change moves a memory layout - again, I don't know how I would have written a test for this, besides something wild like querying smaps (not portable) to see if the final page of the arena allocation had faulted in.
I can't tell if this is refactoring or you're fixing a bug. Is there a scenario which would reproduce a bug? If so, quite possibly a test would help. I've rarely been terribly sympathetic to the view that "writing a test to replicate this bug is too hard" is a good reason for not doing it. Programming is hard. I find that A) bugs often tend to cluster in scenarios that the testing infrastructure is ill equipped to reproduce the scenario B) once you upgrade the testing infrastructure to handle those scenario types those bugs often poof...stop recurring.
>This change was written more in the way you recommend - but the whole change is basically a test. I debugged this by reading the code and thinking about it, then wrote up a pretty complicated fuzz test to help find any future races. I'm guessing that you would not consider adding debug asserts to be a violation of "write the test first"
I would file that under "tightening up typing" again and also file it under "the decision to write a test at all is distinct from the decision to write a test first".
>I don't think TDD would have led to better results in these cases
Again, I see no examples where writing a test after would have been better. There are just a few where you could argue that not writing a test at all is the correct course of action.
- >I've often figured out late in the game how to make something a compile time failure rather than a runtime one
This is actually a good (albeit somewhat niche) reason to not write a test scenario at all, but it's still not a great reason to write a test after instead of before.
>Fundamentally the goal of testing is to describe what behaviors of the software are intentional rather than incidental
Yup. A test scenario which is of no interest to at least some stakeholders probably shouldnt be written at all.
This is again about whether to write a test at all, though, not whether to write it first.
>TDD mixes both concerns
I dont think writing a test after helps unmix those concerns any better.
In fact it's probably a bit easier to link intentional behavior to a test while you have the spec in front of you and before the code is written.
I find people who write test after tend to (not always, but strong tendency) fit the test to the code rather than the requirement. This is really bad.
>Maybe other people work on different types of things and TDD is great for them, but I write primarily infrastructure code where correctness is critical and I have the luxury of time
Assuming Im understanding you correctly (you're building something like terraform?), integration tests which run scenarios matching real features against fake infra would seem to be pretty useful to me.
So...why wont you write tests with that harness before the code? Im still unsure.
The only thing "special" about that type of code that i can see (which isnt even all that special) is that unit tests would often be useless. But so what?
- Im not sure quite why you feel you always need to write code before sussing out what an API or UI should look like but it seems like a very expensive habit to me.
What happens when you then show it to stakeholders (e.g. other teams consuming your API, customers or UX people) or and they tell you to change it again?
Rewrite everything again?
Thats gonna be reaaaaaaaalllly labor intensive and could damage your code base too.
Im equally perplexed about why people dont try to build top down. It's one of those few things in programming that always makes sense regardless of circumstance.
- A shit test written before writing the code is still a shit test. Mimetic tests arent be any better written after the code either.
If I had to choose between 1) always writing specification-linked tests that make as few architectural assumptions as possible and 2) TDD, sure, I'd pick 1 every time.
1 and 2 is still better though.
- >As for what is gained, try this spelling: test driven development adds load to your interfaces at a time when you know the least about the problem you are trying to solve
If Im writing a single line of production code I should know as much as possible what requirements problem Im actually trying to solve with it first, no?
This is actually dovetails into a benefit to writing the test first. If you flesh out a user story scenario in the form of an executable test it can provoke new questions ("hm, actually I'd need the user ID on this new endpoint to satisfy this requirement...") and you can more quickly return to stakeholders ("can you send me a user ID in this API call?") and "fix" your "requirements bugs" before making more expensive lower level changes to the code.
This outside-in "flipping between one layer and the layer directly beneath it" is very effective at properly refining requirements, tests and architecture.
>And thus, the technique gets criticism from both ends -- that design work that should have been done up front is deferred
I dont think "design work" should be done up front if you can help it. I've always felt that the very best architecture emerges as a result of aggressive refactoring done within the confines of a complete set of tests that made as few architectural assumptions as possible. Why? Coz we're all bad at predicting the future and it's better if we dont try.
This is a mostly separate issue from TDD though.
- Ive had this experience with team-specific vocab where certain terms organically end up having terms with two or more conflicting meanings and it was horrendous. It led to all sorts of bugs, misunderstandings and even arguments.
Even worse, most people didnt realize there was a problem coz they always knew what they meant.
The only time I managed to work past it was by convincing everyone to never use that term again - burning it to the ground - and agreeing to replace it with two or more new, unambiguous terms.
Id love to burn "unit test" and "integration test" to the ground but nobody outside my team listens to me :)
Id probably replace them with:
* code coupled
* interface coupled
* high level
* low level
* xUnit
* faked infrastructural
* deployed infrastructural
* hermetic / non hermetic
* declarative / non declarative
- You need the final implementation before taking the final snapshot but you can write the entire test up front (given/when). The snapshot artefact is generated not written (often in a different file entirely), so Id argue it still fits the definition cleanly.
I agree that "unit test"/"integration test" as a definition sucks horribly and leads to people talking past each other, but I think with TDD the main issue is that lots of people have developed a fixed and narrow idea of the kind of test you are "supposed" to write with it which makes the process miserable if the type of code doesnt fit that type of test.
The whole idea of a unit test being "the" kind of "default" test and being "tests a class/method as a unit" definitely needs to die.
- Well ok...but then what kind of code doesnt it fit well?
Almost every user story I follow in production code follows the form of given/when/then scenario which can always be transformed into a test of some kind (e2e, integration, sometimes even unit).
Where it's something like "do x, y and z and then a graph appears" I find TDD with a snapshot test with, say, playwright works best.
- Ive done this too. The exercise wasnt arrays (Im militant about only setting very realistic tasks). My task required modifying existing production-like code and tests.
My hope was always that the candidates would do TDD where it seemed simple and obvious to do so. It was actually pretty rare but the candidates that defaulted to doing that always ended up being better in my opinion. They were always made offers higher than my company could afford elsewhere (so i guess in others' opinions too).
In this thread https://www.hackerneue.com/item?id=43060636 I pondered why most people dont default to TDD for production code and the answer invariably seemed to be "we didnt think TDD was a thing you could do with integration/e2e tests".
- Not necessarily. On plenty of projects I have done 100% TDD and never written a single low level unit test.
The type of test is, in my mind, a completely different topic to red-green-refactor and for the decision about which one to write I follow a set of rules which is also unconnected.
TDD is just red-green-refactor. It works with any test.
- I usually start with a basic e2e that tests the most minimal happy path possible. It makes no assumptions about architecture or anything else.
You don't need something to work with to write it. You can, by definition, write an e2e test against an app that doesnt exist.
This test isnt special as far as TDD is concerned - red-green-refactor works the same way.
Im sensing a pattern in the answers to my question though. I keep getting "well, if you assume TDD is only done with low level unit tests..."
- >Sometimes you need the unit before you can unit test.
Right. In those situations I TDD with an e2e or integration test.
I dont get why youd restrict yourself to doing TDD with just with low level unit tests.
- I hate coding fundamentalism with a passion too. The only thing I get really religious about in coding is the importance of trade offs.
The cost/benefit of writing a test before just consistently exceeded doing it after for me.
Same for integration, e2e or unit tests (there's never been a rule that says you can only TDD with a unit test).
The cost/benefit trade off for tests with mocks vs. database is a different topic - orthogonal to the practise of red/green/refactor, and one where IMO the trade offs are much less obvious.
- Not the only value though. Red-green-refactor can also provides live feedback about whether your code is behaving correctly as you write it.
Requiring the test before writing the code also ensures you dont forget to write a test to match the scenario.
So what is gained by test after... is that it is almost as good?
I still dont get it.
- I still find the skepticism around TDD weird. Except for a few pretty niche scenarios (e.g. it's experimental code or manual testing is cheaper for some obscure reason) i dont really see the point of not doing it.
I especially dont see what is gained by writing the test after.
- >my understanding of the problem only really forms through writing code and seeing what approaches work.
Unless you are working with a new/untested technology or approach (i.e. you need a spike), the same kind of understanding should form while writing the test scenario.
>Maybe this should be a separate prototype phase
I always either spike (in which case I never TDD) or write production code (in which case I always do). I can't under what circumstances anybody would want to convert spike code you based out as quickly as possible to prove a point into production code.
- Depends upon the ORM. Like all frameworks, a really good one is a significant productivity boost while a bad one is faworse than none at all.
- Jane logs in, enters her DOB which is 11/5/1998, does Y the result of which is Z.
Where X, Y and Z are very specific.
These example scenarios work well as a communication medium for discussing intended program behavior as well as translating well into tests.
>enumerate every possible combination
Whereas if you start doing this you will probably confuse your stakeholders.
Specific examples tend to make better specifications.
- >something about the layers below
Absolutely. They were tests over a big ball of mud in a company I had joined recently.
This is, I think, the only good way to work with what is probably (unfortunately) the most common type of real world code architecture.
If your testing approach cant deal with big fragile balls of mud then it is bad. This is why I dont have a lot of respect for the crowd that thinks you must DI first "in order to be able to test". Such architectures are fragile and will break under attempts to introduce dependency inversion.
>Compared to testing APIs or Unit tests they are though.
In the above example there probably wasnt a single code interface or API under the hood that was any good. Coupling to any of those interfaces was fragile with a capital F if you actually expected to refactor any of them (which I did).
Even for decent quality code, the freedom to refactor interfaces is wildly underrated and it is curtailed by coupling a test to it.
- By faking the DB I meant either running a local, prefilled fake DB server for every test or faking the interface to the DB.
Which one you should do depends on how complex your interactions with the DB are.
Some apps (e.g. CRUD) have half of their business logic encoded in DB queries in which case faking the calls is a bad idea.
Others only do, like, 2 simple queries. In this case there's no point running an actual database outside of a couple of E2E tests.
- If you're working with a big ball of mud, I find that the best approach is to immediately start doing TDD with hermetic end to end tests.
Hermetic = could run just fine on their own if run on a freshly installed OS that is cut off from the internet.
The first tests you build this way will be extraordinarily expensive (faking databases & http calls is fiddly), but they pay enormous dividends.
Once you have a large enough body of these and youve refactored some clean interfaces underneath, you can start writing future tests against those.
- The other extreme of this is:
* Bad abstractions which just stick around forever. There are some examples of this in UNIX which would never be invented in the way they are today but nonetheless aren't going anywhere (e.g. signal handling). This isn't good.
* Invent all of your own wheels. This isn't good either.
There's a balance that needs to be struck between all of these 3 extremes.
- >However, very few developers follow this approach religiously
I do it pretty religiously. There are 3 exceptions:
1) I'm doing a spike (i.e. what author calls exploratory code) in which case, probably this code is getting disposed of. This is the one main exception.
2) I'm just tweaking a config value/printed message/something else surface level.
3) The cost of building test infrastructure is prohibitive (if it's a long running project I will aim to keep building that infra until it is possible though...).
That's it. As far as I can tell there arent other scenarios where it isnt a good idea.
- >I believe you other than tests being specifications
If you're not, that suggests you're not doing them right which in turn suggests why you might have an issue with them...
- When I do TDD (virtually every time i write a line of code) each test scenario isnt just a way to verify that the code is working, it's also a specification - often for a previously unconsidered edge case.
Throwing away the test means throwing away that user story and the value that comes with it.
- I think the premise is correct and I think you are disagreeing with it.
Yes, the pyramid was set out as a goal in its original incarnation. That was deeply wrong. The shape ought to be emergent and determined by the nature of the app being tested (i went into detail on what should determine that here https://www.hackerneue.com/item?id=42709404)
Some of the most useful tests Ive worked with HAVE had a large GUI tip. The GUI behavior was the most stable surface whose behavior was clearly defined which everybody agreed upon. all the code got tested. GUI tests provided the greatest freedom to refactor, covered the most bugs and provided the most value by far on that project.
GUI tests are not inherently fragile or inherently too slow either. This is just a tendency that is highly context specific, and as the "pyramid" demonstrates - if you build a rule out of a tendency that is context specific it's going to be a shit rule.
- Id say if you think tests and types are doing the same thing in the same way you are badly abusing at least one of them.
One attacks the problem of bugs from the bottom up and the other from the top down. They both have diminishing returns on investment the closer they get to overlapping on covering the same types of bug.
The haskell bros who think tests dont do anything useful because "a good type system covers all bugs" themselves havent really delivered anything useful.
- You can get 100% coverage by focusing on testing the public API too. These two things are completely orthogonal.
If you change the spec (e.g. changing the contract on a REST API), you will probably need to consult to make sure it aligns with everybody's expectations. Does the team calling it even have the customer ID you've just decided to require on, say, this new endpoint?
>You seem to have forgotten what I said, something needs to exist for me to work with.
No. I'm assuming here that a code base exists and that you are mostly (if not 100%) familiar with it.