Comment by charleslmunger

charleslmunger Feb 16, 2025 parent

This is hilarious because it's a perfect interview (detects precisely the thing they're testing for) but also provides total adverse selection because the thing they're testing for is ridiculous.

hitchstory Feb 16, 2025

Ive done this too. The exercise wasnt arrays (Im militant about only setting very realistic tasks). My task required modifying existing production-like code and tests.

My hope was always that the candidates would do TDD where it seemed simple and obvious to do so. It was actually pretty rare but the candidates that defaulted to doing that always ended up being better in my opinion. They were always made offers higher than my company could afford elsewhere (so i guess in others' opinions too).

In this thread https://www.hackerneue.com/item?id=43060636 I pondered why most people dont default to TDD for production code and the answer invariably seemed to be "we didnt think TDD was a thing you could do with integration/e2e tests".

charleslmunger OP Feb 16, 2025

I think having tests for all your diffs at the level of published commits/change lists/etc is totally reasonable for software you really care about. What's counterproductive is practicing TDD at the level of individual editor operations.

If I'm fixing a bug, I start by writing a test that reproduces the bug. If I can't do that, I fix the test harness until I can. Then I implement the change, making mental notes of each intermediate bug I think about along the way - things like "I should be careful to name this distinctly so that it's not confused with this other value in scope that has the same type". After that, I cull down that list until it's reasonable and not totally paranoid, and write tests covering those cases. Same thing for any bugs in in-progress code caught by manual testing, fuzzers, etc.

If you have discipline and use version control, you don't need to write tests before you write the actual code to get the same level of coverage as TDD and you waste a lot less time. I've often figured out late in the game how to make something a compile time failure rather than a runtime one - time to delete all those tests written along the way? Encode them all as negative compilation tests? Fundamentally the goal of testing is to describe what behaviors of the software are intentional rather than incidental, and to detect bugs that might be introduced by future changes to the software - TDD mixes both concerns and doesn't put any emphasis on preventing future bugs specifically.

Maybe other people work on different types of things and TDD is great for them, but I write primarily infrastructure code where correctness is critical and I have the luxury of time, and TDD doesn't produce better results for me. This is a case TDD feels like it should work well for, but in my experience it doesn't improve correctness, maintainability, or speed of delivery - at least compared to the alternative I described. I'm sure there's a universe of teams with sloppy practices out there that TDD would be an improvement for, but it's not helpful for me.

hitchstory Feb 16, 2025

>I've often figured out late in the game how to make something a compile time failure rather than a runtime one

This is actually a good (albeit somewhat niche) reason to not write a test scenario at all, but it's still not a great reason to write a test after instead of before.

>Fundamentally the goal of testing is to describe what behaviors of the software are intentional rather than incidental

Yup. A test scenario which is of no interest to at least some stakeholders probably shouldnt be written at all.

This is again about whether to write a test at all, though, not whether to write it first.

>TDD mixes both concerns

I dont think writing a test after helps unmix those concerns any better.

In fact it's probably a bit easier to link intentional behavior to a test while you have the spec in front of you and before the code is written.

I find people who write test after tend to (not always, but strong tendency) fit the test to the code rather than the requirement. This is really bad.

>Maybe other people work on different types of things and TDD is great for them, but I write primarily infrastructure code where correctness is critical and I have the luxury of time

Assuming Im understanding you correctly (you're building something like terraform?), integration tests which run scenarios matching real features against fake infra would seem to be pretty useful to me.

So...why wont you write tests with that harness before the code? Im still unsure.

The only thing "special" about that type of code that i can see (which isnt even all that special) is that unit tests would often be useless. But so what?

charleslmunger OP Feb 17, 2025

>This is actually a good (albeit somewhat niche) reason to not write a test scenario at all, but it's still not a great reason to write a test after instead of before.

But the before-test is strictly negative - it's a waste of time (deleted code, never submitted) and it possibly slowed down development (had to update the test as I messed with APIs).

>Yup. A test scenario which is of no interest to at least some stakeholders probably shouldnt be written at all.

And yet I see TDD practitioners as the primary source of such tests - if you are dogmatically writing a test for every intermediate change, you will end up with lots of extra tests that assert things in order to satisfy the TDD dogma rather than the specific needs of the problem. Obviously this can be avoided with judgement - but if you have sound independent judgement you don't need to adhere to specific philosophies about the order you make changes in.

>In fact it's probably a bit easier to link intentional behavior to a test while you have the spec in front of you and before the code is written.

When implementing to a spec you are absolutely right, but a very small amount of software is completely or even mostly specified in advance.

>I find people who write test after tend to (not always, but strong tendency) fit the test to the code rather than the requirement. This is really bad.

I agree this can lead to brittle tests and lack of spec adherence, but if you are iterating on intermediate state and writing tests as you go, the structure of the code you wrote 30 seconds ago is very much influencing the test you're writing now.

Another issue is that fault injection tests basically require coupling to the implementation - "make the Nth allocation fail" etc. The way I prefer to write these is to write the implementation first, then write the fuzz test - add a few bugs in the implementation, and fix/enhance the fuzz test until it catches them. Fuzz testing is one of the best bang-for-buck testing methodologies there is, and in my experience it's very hard to write a really good fuzz test unless you already have most of your implementation, so you can ensure your fuzz tester is actually exercising the stuff you want it to.

>Assuming Im understanding you correctly (you're building something like terraform?),

I write library code for mobile phones, mostly in Java/Kotlin. I recently did some open source work (warning: I am not actually very proficient with C, any good results are from enormous time spent and my code reviewers, constructive criticism very much welcome). Here's a few somewhat small, contained changes of mine, so we can talk about something concrete:

https://github.com/protocolbuffers/protobuf/pull/19893/files

This change alters a lock-free data structure to add a monotonicity invariant, when the space allocated is queried on an already-fused arena while racing with another fuse. I didn't add tests for this - I spent a fair bit of time thinking about how to do it, and decided that the type of test I would have to write to reliably reproduce this was not going to be net better at preventing a future bug, given its cost, than a comment in the implementation code and markdown documentation of the data structure. I don't know how I would really have made this change with a TDD methodology.

https://github.com/protocolbuffers/protobuf/pull/19933/files

This change moves a memory layout - again, I don't know how I would have written a test for this, besides something wild like querying smaps (not portable) to see if the final page of the arena allocation had faulted in.

https://github.com/protocolbuffers/protobuf/pull/19885/files

This change was written more in the way you recommend - but the whole change is basically a test. I debugged this by reading the code and thinking about it, then wrote up a pretty complicated fuzz test to help find any future races. I'm guessing that you would not consider adding debug asserts to be a violation of "write the test first"? So in this case, I followed TDD's order - not because I was following TDD, but because the code change was trivial and all the hard work was thinking about the data structures and memory model.

https://github.com/protocolbuffers/protobuf/pull/19688/files

All the tests were submitted before the implementation change here, but not because of TDD - in this case, I was trying to optimize performance, and wrote the whole implementation before any new tests - because changing the implementation required changing the API to no longer expose contiguous memory. But I did not want to churn all the users of the public API unless I knew my implementation was actually going to deliver a performance improvement - so I didn't write any tests for the API until I had the implementation pretty well in hand. Good thing too, because I actually had to alter the new API's behavior a few times to enable the performance I wanted, and if I had written all the tests as I went along, I'd have to go and rewrite them over and over. So in this case I wrote the implementation, got it how I wanted it, wrote and submitted the new API (implemented at first on the old implementation) and added tests, updated all callers to the new API, and then submitted the new implementation.

I don't think TDD would have led to better results in these cases, but you sound like a TDD believer and I'm always interested to hear anything that would make my engineering better.

hitchstory Feb 17, 2025

>But the before-test is strictly negative

I actually did this the other day on a piece of code. I was feeling a bit lazy. I didn't write the test and I figured that making the type checker catch it was enough. I still didn't write a test after either though.

Anecdotally I've always found that tests which cover real life bugs are in the class of test with the highest chance of catching future regressions. So even if it does exist, I'm still mildly skeptical of the idea that tests that catch bugs that compilers have been provoked into also catching are strictly negative.

>And yet I see TDD practitioners as the primary source of such tests

I find the precise opposite to be true. TDD are more likely to tie requirements to tests because they write them directly after getting requirements. Test-after practitioners more likely tie implementation to the test.

It's always possible to write a shit implementation-tied test with TDD, but the person who writes a shit test with TDD will write a shit implementation tied test after too. What did TDD have to do with that? Nothing.

>if you are dogmatically writing a test for every intermediate change, you will end up with lots of extra tests that assert things in order to satisfy the TDD dogma rather than the specific needs of the problem.

I find that this only really happens when you practice TDD with very loose typing. If you practice strict typing, the tests will invariably be narrowed down to ones which address the specific needs of the problem.

Again - without TDD and writing the test after, loose typing is still a shit show. So, I see this as another issue which is about something separate to TDD.

>Obviously this can be avoided with judgement - but if you have sound independent judgement you don't need to adhere to specific philosophies

I think this is conflating "TDD is a panacea" with "if it is valuable to write a test, it's always better to write it before". I've never thought the former, but the examples you've listed here look to me only like examples of where TDD didn't save somebody from making a mistake that was about a separate issue (types, poor quality test). None of them are examples of "actually writing the test after would have been better".

>When implementing to a spec you are absolutely right, but a very small amount of software is completely or even mostly specified in advance.

Why on earth would you do that? If I write even a single line of production code I have specified what that line of code is going to do. I have watched juniors flail around and do this when getting vague specs but seniors generally try to nail down a user story tight with a combination of code investigation, spiking and dialog with stakeholders before writing code that they would otherwise have to toss in the trash can if it wasn't fit for purpose.

To me this isn't related to TDD either. Whether or not I practice TDD, I don't fuck around writing or changing production code if I don't know precisely what result it is I want to achieve. Ever.

Future requirements will probably remain vague but never the ones I'm implementing right now.

>I agree this can lead to brittle tests and lack of spec adherence, but if you are iterating on intermediate state and writing tests as you go, the structure of the code you wrote 30 seconds ago is very much influencing the test you're writing now.

Only if the spec is changing too. This sometimes happens if I discover some issue by looking at the code, but in general my test remains relatively static while the code underneath it iterates.

This obviously wouldn't happen if you wrote implementation-tied tests rather than specification-tied tests but... maybe just don't do that?

>Another issue is that fault injection tests basically require coupling to the implementation

All tests requiring coupling to implementation in some way. The goal is to loosely couple as possible while maximizing speed, ease of use, etc. I'm not really sure why fault injection should be treated as special. If you need to refactor the test harness to allow it, that's probably a really good idea.

>The way I prefer to write these is to write the implementation first, then write the fuzz test - add a few bugs in the implementation, and fix/enhance the fuzz test until it catches them. Fuzz testing is one of the best bang-for-buck testing

Fuzz testing is great and having preferences is fine, but once again fuzz testing says little about the efficacy of TDD (fuzz tests can be written both before and after) and preference for test-after I find tends to mean little more than "I like my old habits".

>Fuzz testing is one of the best bang-for-buck testing methodologies there is, and in my experience it's very hard to write a really good fuzz test unless you already have most of your implementation

In my experience you can (I've done TDD with property tests and, well, I see fuzz testing as simply a subset of that). I also don't see any particular reason why you can't.

These methodologies I find do provide bang for the buck if you're writing a very specific kind of code. I will know if I'm writing that type of code in advance.

>This change alters a lock-free data structure to add a monotonicity invariant,

If I'm reading it correctly, this looks like a class of bugs we discussed that is fixed by tightening up the typing. In which case, no test is strictly necessary, although I'd argue that it probably would not hurt either.

>This change moves a memory layout - again, I don't know how I would have written a test for this, besides something wild like querying smaps (not portable) to see if the final page of the arena allocation had faulted in.

I can't tell if this is refactoring or you're fixing a bug. Is there a scenario which would reproduce a bug? If so, quite possibly a test would help. I've rarely been terribly sympathetic to the view that "writing a test to replicate this bug is too hard" is a good reason for not doing it. Programming is hard. I find that A) bugs often tend to cluster in scenarios that the testing infrastructure is ill equipped to reproduce the scenario B) once you upgrade the testing infrastructure to handle those scenario types those bugs often poof...stop recurring.

>This change was written more in the way you recommend - but the whole change is basically a test. I debugged this by reading the code and thinking about it, then wrote up a pretty complicated fuzz test to help find any future races. I'm guessing that you would not consider adding debug asserts to be a violation of "write the test first"

I would file that under "tightening up typing" again and also file it under "the decision to write a test at all is distinct from the decision to write a test first".

>I don't think TDD would have led to better results in these cases

Again, I see no examples where writing a test after would have been better. There are just a few where you could argue that not writing a test at all is the correct course of action.

charleslmunger OP Feb 17, 2025

>Anecdotally I've always found that tests which cover real life bugs are in the class of test with the highest chance of catching future regressions. So even if it does exist, I'm still mildly skeptical of the idea that tests that catch bugs that compilers have been provoked into also catching are strictly negative.

Maybe this is a static typing thing - if the test won't build because you've made the bug inexpressible in the type system, what test can you even write?

>If I write even a single line of production code I have specified what that line of code is going to do.

>combination of code investigation, spiking and dialog with stakeholders

Does "spiking" in this context mean writing code without TDD?

>I think this is conflating "TDD is a panacea" with "if it is valuable to write a test, it's always better to write it before".

That's a weaker formulation of TDD than I've seen espoused and practiced, which is usually something more like "before you make any behavioral changes to the code under test, you must write a test that fails; then make your edit so the test passes, and repeat". The problem with your approach at least for me is that until I've messed around a bit in the code seeing what's the best approach is to solving a problem, I don't know what the best structure for a final test is, or whether I can make the change in a way that leverages existing tests to detect the bug.

> I'm not really sure why fault injection should be treated as special.

Suppose you write a test reproducing a bug that reacts to allocation failing. One common way to do that is to inject an allocator that fails on the specific allocation call you had a bug with. But how do you target that specific call? One way is to make the Nth allocation in a test fail - but even slight changes to the production code will make this test start testing something completely different. The solution is to have a fuzz test that injects failed allocations randomly per run - now you can be reasonably confident that even if the prod code changes over time, your fuzz test will still all the allocation sites, preventing regression. I don't see why it's beneficial to write this first or last. Under your model the instrumented allocator setup is a replacement for the single test case that reproduces the original bug.

>If I'm reading it correctly, this looks like a class of bugs we discussed that is fixed by tightening up the typing. In which case, no test is strictly necessary, although I'd argue that it probably would not hurt either.

No, the original "bug" is that if you have two calls to SpaceAllocated and the second one races with a Fuse call on a other thread, you can see a smaller value result than before. This behavior wasn't guaranteed by the public API (so not a bug) but it's desirable to add. The fix is replacing a singly linked list with a doubly linked list, using tagged pointers to avoid adding storage cost for the second set of links. A test could be written for this; I could inspect the allocated addresses of three arenas and fuse them in a specific order, but I could not actually reproduce the ordering required without a test harness that allows full control of thread execution interleaving - which would be a huge amount of work and in my opinion not actually prevent future bugs, since that interleaving only meaningfully exists in the previous implementation. The full set of possible interleavings is way too large to productively explore. If I had started by writing the test, I would have spent a bunch of time messing around before giving up; because I did the implementation first, I had a much more informed idea of what would be required to test it, and changed my plan for preventing future bugs to documentation.

>I can't tell if this is refactoring or you're fixing a bug. Is there a scenario which would reproduce a bug?

In this case the "bug" is not really a bug - if we are allocating an arena, we put the overhead at the start rather than end of the first memory block. The reason we're doing this is that while it's totally valid to use the provided memory however we want, we can avoid faulting an extra page if we store the overhead near where we're about to allocate from. So the test would have to test "on a virtual memory platform, do we write to memory in a way that maximizes spatial locality". This is possible to do but it's a total mess of a test, and I have no idea how to do it on Windows. More importantly, anyone who changes this code is going to be doing it on purpose, and we don't guarantee to callers where we'll be placing any pointers we return.

>Again, I see no examples where writing a test after would have been better.

If TDD produces better results for you, that's great. I think I made a case with real examples where using TDD would have been at best neutral, but in practice would have made me spend more time to get the same results.

This item has no comments currently.

Preferences

Keyboard Shortcuts

Story Lists

Navigation

Miscellaneous