Remove duplication to deliberately forget
Exploring the limits of Don't Repeat Yourself (DRY) and the Rule of Three and an alternative way to think about code duplication from a psychological perspective.
Each of every step we software engineers make, however small, requires judgement. The decision of whether or not we should remove code duplication is one of those tiny steps. “Should this duplicated code be refactored?” is a question that we often have to answer. I have been guided by principles such as DRY and the Rule of three for many years to answer this question, and I assume many of you too. This post will explore the limits of these principles I experienced, and I’ll suggest an alternative way to think about duplication.
DRY is one of the first principles I learnt to tidy up my code, introduced very early in my career. “Remove all duplications you can see, do not copy-paste your code”, they say. With this oversimplified definition of DRY, the answer to the question “Should this duplicated code be refactored?” is a resounding yes, regardless of the context.
(For brevity, this question will be shortened to “Refactor this duplicate?” for the rest of this post).
Whilst this oversimplification of DRY gets me going for a while, there’s a limit to this definition. Making an attempt to refactor duplication prematurely introduces a risk of the wrong abstraction, which increases the cost of maintenance more than duplications in the first place.
I suspect many software engineers started with this definition too, and discovered the rule of three along the way. The rule of three states, “Three strikes and you refactor”. It is a rule that emphasises how we shouldn’t refactor duplications prematurely, addressing the limitations of the oversimplified DRY definition.
“Refactor this duplicate?”, yes when we have three copies of similar code. We’re now in a better place to make a better judgement.
However, there are scenarios where I would be confident in refactoring two instances of similar-looking code. Either I’m confident that it’s the right abstraction, or it’s just plain duplication that I can safely remove.
Let’s take a simple example:
/**
* Marks a task as done.
*/
const markTaskAsDone = (task) => {}
In this scenario, the comment on top of the function is a duplication, and I will remove it immediately because the function’s intent is already clearly expressed by its name.
You might argue that the whole idea of the rule of three is about making sure that we avoid premature abstraction, so it doesn’t apply here. In reality, the rule of three tends to muddle up. When I remove this duplication, I get a “Wisen, I thought you said three strikes, then refactor. We only have two strikes here!”.
We need some help, and this is the time when I’ll draw up the original definition of DRY to justify myself:
Every piece of knowledge must have a single, unambiguous, authoritative representation within a system.
This original definition is very different to the above oversimplification of DRY aforementioned. One word that’s quite crucial to digest here is knowledge. Let’s reattempt our tiny step with the same question.
“Refactor this duplicate?”, yes, that comment is a knowledge duplication. We can now make a better judgement.
This definition, though, can be hard to understand for more complicated examples, and I have battled with this original definition for a long time. Let’s take an example from the Pragmatic Programmer book, where the DRY principle was formulated:
def validate_age(value):
validate_type(value, :integer)
validate_min_integer(value, 0)
def validate_quantity(value):
validate_type(value, :integer)
validate_min_integer(value, 0)
In this example, validate_type
and validate_min_integer
are not knowledge duplications. In this case, they’re just a coincidence, so we don’t refactor it. Not all code duplication is knowledge duplication.
At least, that’s how normally the conversation should end. The curious ones will ask the following question “How do I differentiate if that’s a knowledge duplication or coincidence?”. Well, young Jedi, you will be able to tell that once you’re experienced enough. Worse, the follow-up philosophical question you might get is, “What is knowledge?”.
That’s not good. What I need is a way of thinking that gets us to make an optimal judgement every day. The rule of three had been quite rigid, and DRY seemed to be quite hard to apply appropriately. I need to have a constraint that I can apply easily but produce an optimal result, a middle ground.
Reflecting on my recent work, when I don’t refactor duplication, I will have moments of “Ah, I need to remember also to update X, Y, and Z”. That act of remembering is distractive and error-prone. What if I forget to update Y? What is it that I was doing again?
What’s wrong with remembering things, you may ask?
We have working memory limits, and we’d like to make sure that we populate our working memory with things that are relevant to us at that point in time. If we’re populating our working memory with “I need to remember X, Y, and Z”, we are wasting our precious working memory.
When I remove duplications, I realise I can forget about remembering them. This becomes a great heuristic for me. My motivation for removing code duplication shifted, from how the code looks, to become what will happen to me in the future.
Optimising what we populate in our working memory matters greatly to me, as we should make our design choices based on our biological limits. The way a keyboard is designed, like what you see today, is because we have ten fingers.
This heuristic, the act of forgetting and remembering, also implies the aspect of time. Time is the primary difference between programming and software engineering. We need to adapt to changing needs over time, and this context of what probabilities might come in the future, helps us make a better judgement.
Let’s take a look into how we can apply this way of thinking to our first example.
/**
* Marks a task as done.
*/
const markTaskAsDone = (task) => {}
“Refactor this duplicate?”, yes, I don’t want to remember updating that comment when my function name changes in the future. Simple.
The second example, where we have validate_age
and validate_quantity
, is a trickier one. Let’s have a look into some possible responses:
I’m not sure. They’re sitting together closely now, which will make us remember easily, maybe?
No, I don’t need to remember them; these two functions will change independently.
Yes, we have a new requested change for next week that it’s going to make them similar, again (You can respond YAGNI to this person).
What I like about these potential responses is you are not going to get stuck in a debate about what knowledge is, and what a non-knowledge is. You’ll also not get stuck in a debate of two instances or three instances scenario (rule of three). This way of thinking also doesn’t take away the engineer's judgement, and it shifts the focus to the context they’re currently in.
What do you think? Will this way of thinking help improve our relationship with code duplication? I might need some help coming up with a cool acronym, but for now, remember this “Remove duplication to deliberately forget”.
I started this post by describing the relationship between software engineers and code duplication, and I will end this post by describing the relationship I had with forgetfulness. The issue I’ve faced with this way of thinking is some people have a good relationship with their forgetfulness, and some do not.
I used to hate myself when I forgot to do something. I told myself, argh, I forgot to bring that package to the post office again when I went out. So forgetful, now that package is not sent, again!
You may associate forgetting with something bad, like memory loss. Even though forgetting sounds like something that you'd like to cure, forgetting is actually a healthy process that you'd like to keep. Forgetting is the process of our brain inhibiting the memory that is irrelevant in the current moment. Imagine having all of the information that is irrelevant to us in our mind, we'll not be able to focus on anything at all.
Solomon Shereshevsky is a famous figure in this area of psychology. Even though he has a remarkable memory, it doesn’t come without a trade-off. He was diagnosed with severe synaesthesia. He might be able to remember every word said in a movie, or words in poems, but he wouldn't be able to understand the meaning of them. He couldn't even order an ice cream, from one of his stories:
One time I went to buy some ice cream ... I walked over to the vendor and asked her what kind of ice cream she had. 'Fruit ice cream,' she said. But she answered in such a tone that a whole pile of coals, of black cinders, came bursting out of her mouth, and I couldn't bring myself to buy any ice cream after she had answered in that way …
No, thank you. I’d rather remember less and be in the moment. What I do now is put that package by the front door, and it’ll serve as a visual cue for me to remember the next time I go out.
Forgetting is a feature, not a bug. Deliberately forget.