Test First vs Last: False Positive

How test results can deceive us and how the sequence of writing tests affects it.

Jul 12, 2023

This is the latest issue of my newsletter series, ‘Test First vs Last’, where we’ll explore how the sequence of writing tests affects our minds. The full context of this series can be found in the introduction post.

I picked up a small feature to work on. It seemed simple at first, so I started to write the code. After a few hours, the production code was ready.

Being diligent, I ensure to write a test for the feature:

it("must return truthy", () => {
  expect(newFeat())
    .resolves
    .toBeTruthy()
})

The test result is green, my code works! I pushed the change to the deployment pipeline, which also indicates green for the newly written test. I call it a day, satisfied with my work.

The next day, our support team reached out and told me that many users had started using this new feature, but it didn’t work at all! Wait, I tested this, didn’t I? The test was green.

Two green reasons

As I dug into the issue, I discovered the new feature was indeed broken, but why did the test not fail? Apparently, I had missed including the ‘await‘ operator. Once I added the operator, my test failed. I experienced a false positive test.

it("must return truthy", () => {
  await expect(newFeat())
    .resolves
    .toBeTruthy()
})

Reflecting on my false positive test experience, I realised that there are two potential reasons why a test might show “green”:

The code is working
The test is incorrect (false positive)

When I write my test last, I often expect the code to work and overlook the possibility of the test being incorrect, which gives me a false sense of security.

Narrowing down to one reason

To eliminate the risk of false positives, it’s important to see the test fail. You need to see the test red. To see a red test, you have to deliberately introduce a bug to your code, therefore breaking it. The steps might look like this:

Write code
Write tests (green)
Break the code to make tests red
Change the code back to green

If you compare these steps against writing tests first, the production code doesn’t exist yet when you write your test. The absence of the production code removes the possibility of the “code is working” reason when you see a green passing test. This means if your test is green, the only possible explanation is “the test is incorrect”. The steps for the test-first approach therefore are shorter:

Write failing tests (red)
Write minimal code (red)
Make the test pass (green)

2023-07-13 Edit:

Krishna Kumar

has raised an important point in the comment, do read that and join the discussion if you have thoughts. I added “Write minimal code” as a step for the test-first approach, as otherwise, you’ll get false positives even in the test-first approach with this example! The minimal code may look like this const newFeat = async () => {}; I have also removed “Refactoring” as irrelevant in this post.

Although the ‘test last’ approach involves more steps and can be inefficient, you can be pretty fast in executing these steps once you get used to it. However, the challenge lies in the potential for human error, especially forgetting to deliberately introduce a bug. When you forget these steps, you increase the risk of encountering false positives.

In contrast, the ‘test first’ approach means you have fewer things to remember, as it automatically prompts you with a failing test first. Having fewer things to remember is beneficial for us engineers (see my other essay: Remove duplication to deliberately forget).

While you might assume that the likelihood of encountering a false positive is low, it’s important to remember that these false positives can significantly impact developer productivity.

The impact on productivity

The three dimensions of the DevEx framework highlight what we can focus on to improve developer productivity1: cognitive load, flow state, and feedback loop. When I experience a false positive test, all these dimensions could be affected:

Flow state: When our reality doesn’t match what we expect, we experience negative emotions and frustrations, which is the antithesis of the state of flow and feelings of joy. In reality, I have tested the feature and expected the feature to work.
Feedback loop: A feedback loop containing a false positive means I don’t actually have feedback from the test. In this example, my feedback loop was from the support team, not the test.
Cognitive load: When I hear from the support team, I might have moved on to a different task. The need for a sudden context switch can lead to an increased cognitive load.

Conclusion

So what does all this mean for the ‘Test First vs Last”? As we’ve seen, choosing the test-first approach makes you less likely to forget to eliminate false positives, given that the production code has not been written yet. The test-first approach provides shorter steps to eliminate false positives. Reducing the likelihood of false positives is important as they can affect developer productivity.

While fixing the broken feature, my product manager reached out to me and told me that what’s been built was not what he had in mind. That’s frustrating! In my next post, I’ll cover how testing first vs last may affect this scenario.

The DevEx framework: the paper.

Chris Behan

Jul 27, 2023

Big fan of breaking a test to ensure I'm testing what I think I am. Great way to avoid false positives without having to write your tests first/ follow Test Driven Development, which I've always found tedious. Thanks for eloquently putting this process into words!

Expand full comment

1 reply by Wisen Tanasa

Apologies if I am missing something here, but in your example, assuming you wrote your test first and it initially failed because there was no implementation and then you added the same implementation you wrote earlier and the test (without the await) passed, haven’t you ended up in the same place (false positive test)?

I fully agree the idea that you need to see a failing test, but don’t you still have to prove to yourself that the test is failing for the reasons you expected it to fail?

In the case of a subtle bug in the test like this one, does test first really address the problem?

4 replies by Wisen Tanasa and others

5 more comments...

Quantum Steps

Discussion about this post