Test First vs Last: Task Aversion and Confirmation Bias
How the sequence of writing tests may reduce the impact of confirmation bias.
This is the latest issue of my newsletter series, ‘Test First vs Last’, where we’ll explore how the sequence of writing tests affects our minds. The full context of this series can be found in the introduction post.
Apologies for the inactivity, as I’ve been away for the summer holiday to spend more time with my family. I suspect that I’ll be pretty busy in the upcoming weeks. Therefore, your support will help me commit to my writing!
In my previous essay of this series, the Mere-Exposure Effect, I wrote about how practising the test-first approach encourages me to pause, then makes me more deliberate on the subsequent steps. When I practise test-last, though, I don’t pause, and the test I wrote didn’t discover a bug. Why is that happening? In this essay, we’ll explore why the test-last approach often leads to less effective testing due to task aversion and confirmation bias.
Four aversive scenarios
My job as a software engineer is to deliver value to my customers, and this is primarily done by shipping changes. When I write tests last, the test I’m writing feels like a chore. If what I’m changing has been tested manually, I’ll probably dread writing the tests even more.
After all, I’m done with the change, why bother writing tests? Suddenly, when I saw the written code, I felt it was hard to write tests. Maybe it’s not worth the effort. Oh, and that test framework I haven’t learned properly, it will be tough writing the test. I’m experiencing task aversion.
Typically, four scenarios may happen in my mind:
Experience aversion → Write superficial test → Ship code
Experience aversion → Write test → Threatened by red tests → Revert to scenario 3 or persevere
Experience aversion → Skip writing test → Ship code
Experience aversion → Write test → Ship code
When I write the test last, I mostly fall under the scenario of 1, 2, or 3. In the subsequent sections, I’ll explain some of the terms I’ve used here.
If you feel that 100% of your time you get to scenario 4 without any bugs after you ship, this essay is not for you. If you’re a mere mortal like me, though, let’s carry on.
Confirmation bias
Scenario 1: Experience aversion → Write superficial test → Ship code
What I mean by a superficial test is the kind of test that covers the current behaviour instead of the expected behaviour. Given my aversion, I must rush and ship, and my biases take over. Under the influence of confirmation bias, my tests affirm that the code executes as I initially programmed it, ignoring whether or not the application satisfies user needs or handles edge cases effectively.
Confirmation bias is a type of cognitive bias where there is a tendency for people to find information to confirm their preconceptions. My preconception when I write the test last is that my code works. Therefore, my bias will lead me to write a test to cover that behaviour.
It’s an odd experience. The moment I have written my code, the code becomes my new expected behaviour, overriding what I had in my mind before I wrote my code.
Being aware that we have confirmation bias in all of us is a good first step in tackling this issue. We can deliberately spend time writing the tests and make sure we think about the expected behaviour—no more superficial tests.
Unfortunately, though, this is not the end of the problem caused by confirmation bias.
Red tests become a threat
Scenario 2: Experience aversion → Write test → Threatened by red tests → Revert to scenario 3 or persevere
I’m done with the code, pushed through my aversion, and now am writing the test rigorously, with the awareness of confirmation bias. This is the last mile of what I’m trying to deliver. I can see the end of the tunnel.
I wrote how I expected the code to behave… Oops, my tests are red. This is not what I expected, and I expected my code to work (confirmation bias at play). This can’t be my code.
These red tests are threatening my glory! A frustration and a conflict of interest began to crop up in my mind. Is this test worth writing? I’m feeling impatient. I expect to ship the change today, and I’ve told my team about this!
Sometimes, when this happens, I will revert to scenario 3, where I skip writing the test altogether. Sometimes, I persevere with rationality and a sense of identity and complete my work properly. When this happens, I write the test with a sense of rush and displeasure.
We all have confirmation bias
The feeling of threat comes from the conflict of information I perceive; my code works ready to be shipped, but the tests show otherwise. The sense of threat doesn’t only happen to software engineers because confirmation bias can occur to everyone.
Scientists, like anyone else, can feel threatened by information or results that contradict their expectations or existing work. A lot of emotional and intellectual investment has gone into scientific research, and negative evidence that contradicts their existing hypothesis can feel like a personal failure, which is further intensified by time pressure and limited funding.
Even the best thinkers experience confirmation bias. What sets them apart is their awareness of this bias and having strategies to combat it. Charles Darwin, for example, had a “golden rule”1 to actively note any information contradicting his current beliefs. This strategy requires a significant amount of discipline.
While this kind of discipline works, it requires much mental effort to persevere. What I’m interested in more is to have a system, a workflow, that can help me battle this bias without much mental effort.
If writing tests, an activity supposed to help us deliver software better, is becoming a threat to our work, it’s a sign that we have an ineffective approach to work.
Reversing the sequence
What I’ve found to be effective in tackling confirmation bias is simply reversing the sequence: writing test first.
It’s worth noting that I still experience task aversion when I take the test-first approach, even after decades of practising TDD. But once I get past the initial reluctance, the rest of the development becomes smoother.
I don’t experience the sense of conflict between getting things shipped and ensuring that tests are written. That’s because I know I haven’t delivered the expected value, the application code has not been written yet. It’s clear that there are still more things that I have to do. In contrast to the test-last approach, skipping the test or writing a superficial test is not a strong option on the table.
When I write a list of pending tests first, I also have a better expectation of what’s left in my work. As I began to write a test, more scenarios occurred in my head, and I started writing more of the pending tests.
And lastly, when I write tests first, the feeling of threat associated with failing tests is significantly decreased. I don’t see failing tests as conflicting information, given that I expect to see failing tests in the test-first approach. Red tests are happening by design.
Conclusion
As a software engineer, there are two points where confirmation bias will impact our work:
Tests do not reflect the expected behaviour, but how the code currently behaves.
Failing test results are perceived as a threat to progress.
Even the best thinkers experience confirmation bias. Therefore, we need to be aware and have an approach to tackle this bias. When tackling this bias with the test-last approach, you will require discipline and be ready to set your expectations when the test turns red.
Adopting the test-first approach, on the other hand, reduces the cognitive load and minimises the negative impact of confirmation bias, making it a more effective approach.
Up next: I’ll cover how the reward mechanism differs when we write test first vs last and how it affects us. Subscribe so it lands in your inbox.
Charles Darwin’s golden rule: Goodreads quote.
Hey Wisen,
When I use to write tests last, I didn't have task aversion because I valued the feedback I would get on my work, as I too consider myself a mere mortal. However, where I did face task aversion is where the testing infrastructure was more complex than the code itself, and not for good reason. Or when unit tests were created for god like classes because of ci/cd code coverage requirements.
So I think it's important that the test code of the application is in a good state for developers to not face this aversion and for the metrics being tracked to mean something.
Also, I think you read my mind here, "When I write a list of pending tests first, I also have a better expectation of what’s left in my work."
I've noticed when practicing TDD, I become more confident in the scope of the work to be accomplished once my tests are written. Once the tests are written I have the short feedback mechanism in place to guide my development, I have a deeper understanding of the business problem as I'm being forced to think about the problem from perspective of the user and I'm more confident about the design as it will raise a red flag to me if it's hard to use.
I think the acceptance level tests is really what gave me that confidence, a high level test that was interacting with the system like a client would. I was more confident that it would work as expected and that I wasn't missing something.
The acceptance test would ground me to reality in ensuring that I was implementing the solution that solves a real business problem and the unit tests would allow me to more granularly iterate and achieve comprehensive correctness.