a little duplication...

4min read

A Little Copying Is Better Than a Little Coupling

Sandi Mehtz:

Duplication is far cheaper than the wrong abstraction.

In software design, we’re taught to avoid duplication, but trying too hard to achieve this can create a problem much worse: coupling. A little duplication is easy to manage, but a little coupling can slow a team for years.

I've seen many occasions in the past where people have tried to "share" as much code possible (to keep their code and their PRs short). Python's inheritance patterns can make it very tempting to join "similar" things together on the basis that they can easily be split apart if they diverge. Unfortunately, existing code exerts a powerful influence. Its very presence argues that it is both correct and necessary. We know that code represents effort expended, and we are very motivated to preserve the value of this effort. Consequently, when looking at someone else's code we instead try to preserve the abstraction and expand upon it. As a result, we see "if" conditionals sprinkled into the code to handle the inaccuracies of the abstraction.

What is even more fascinating, is the original DRY quote from The Pragmatic Programmer that inspired the reduction of duplicate code:

Every piece of knowledge must have a single, unambiguous, authoritative representation within a system.

I'm not quite sure how we got from the above definition to "these functions look similar, let's just define it once". The DRY principle clearly relates to "knowledge". The modified version of The Pragmatic Engineer even goes to great length to correct this problem and provides an example where duplication makes perfect sense because the code represents two separate ideas.

def validate_age(value):
    validate_type(value, :integer)
    validate_min_integer(value, 0)

def validate_quantity(value):
    validate_type(value, :integer)
    validate_min_integer(value, 0)

Despite having the exact same body, these two functions represent two different ideas and as such are separate. Keeping them separate allows them to easily be extended independently in the future and prevents two distinct ideas becoming muddled in the future.

If two things look similar but mean different things, duplication expresses that intent clearly.

Something I didn't want to leave out...

There's another fab example in the book stating that you shouldn't repeat information in data.

class Line {
    Point start;
    Point end;
    double length;
};

Here we have duplicated the "length" information as it can be inferred by the start and end values. By having start, end and length we can obtain the same information from multiple places (instead of having one unambiguous, authoritative representation). This is increasingly apparent if someone should erronesouly change one of the values without updating the others...