Make simple things simpler?

# Junior def count_living_per_year(population): living_per_year = {} for birth, death in population: for year in range(birth, death): if year in living_per_year: living_per_year[year] += 1 else: living_per_year[year] = 1 return living_per_year # Senior def count_living_per_year(population): return collections.Counter( year for birth, death in population for year in range(birth, death) )

I often encounter juniors during training sessions or when I join a new team.

They all have in common a simple, naive way to code. Is that bad in itself? I don’t think so. I prefer the simplicity and efficiency of FastAPI over the convoluted 100% DRY design of DjangoRestFramework. I prefer the naive and efficiency of Python over the complexity of C++, the boilerplate OOP required code of Java or the strange and destabilizing syntax of Lisp.

For seasoned python developers, are you confident when you start a pull-request review and noticed a MetaSomething(type) and a Something(meta=MetaSomething)? I’m not.

So, there is something great with beginners code. Simplicity. But it’s fake simplicity, and there is something better than that: True simplicity.

Fake simplicity

A beginner’s code is simple mostly because it describe everything it does at a very low/granular level. Let’s see an example:

def count_living_per_year(population: list[tuple[int, int]]) -> dict[int, int]:
    living_per_year = {}
    for birth, death in population:
        for year in range(birth, death):
            if year in living_per_year:
                living_per_year[year] += 1
            else:
                living_per_year[year] = 1
    return living_per_year

This code return the number of person in a population that are alive at the end of a year. You can use this doctest to test the code yourself if you want:

    >>> count_living_per_year([(2000, 2003), (2001, 2002)]) == {2000: 1, 2001: 2, 2002: 1}
    True

As I said, this function described the algorithm at a very low level, doing everything. In the two nested for loops, it test if a year is in the dictionary, create the key and set the value at 1 if not, increment by one otherwise.

Great, but that’s a fake simplicity. The first downside I can mention is the risk of the writer.

We are humans

I am human, and I do mistakes. And other people too. So when you see a code like this, written by a beginner, there is a LOT of room for mistakes. And a lot of room for mistakes is worse than very little room for mistake, isn’t it?

Risk mitigation

Of course, risk of mistakes can be mitigated with automated testing, but if you want to rely on that only, you need to be sure of:

  1. Is the test for this feature already done? By who?
  2. 100% trust in the code coverage (Even 100% doesn’t guaranty that 100% of the code is actually covered, we’ll talk about that in the future),
  3. 100% branch coverage,
  4. 100% trust in the test correctness (Hard to reach, are you sure everyone who works on the test didn’t run the code, copy-paste the result as the expected output, without checking if it was correct?).

If you don’t, then you have to review the code, and spend some time/energy to make sure there is nothing wrong. And when you do, I guess that you prefer a code where risk of mistakes are low and where mistakes are easy to spot.

How to make simple things even simpler?

How to make something simple even simpler? Easy: don’t do it. If you don’t do something, it can’t be complicated, right? If you don’t do something, it “cannot not” works. Yes, not doing something yourself remove the risk of failing doing that thing right.

This sound stupid to you? Then read what follow carefully.

Step 1, remove the if statement

In the previous code, an if statement was done to detect if a specific year was already in the dictionary:

if year in living_per_year:
    living_per_year[year] += 1
else:
    living_per_year[year] = 1

Let’s list what could go wrong here:

  • Inverting the condition and the corresponding operation (invert line 2 with line 4), or write if year not in living_per_year instead.
  • logical operator anti-pattern (if not year in living_per_year instead of if year not in living_per_year).
  • Typo in the augmented assignment e.g. ==, or in the value assigned if it’s a new year e.g. living_per_year[year] = 0.

So if you don’t code this if, you can’t make those mistakes (and we all do mistakes, not matter our seniority, we can be tired, have a few seconds of inattention, etc…). But how to get rid of the if statement? By using .get() to get a value, or a default one if the key is not in the dictionary:

def count_living_per_year(population):
    living_per_year = {}
    for birth, death in population:
        for year in range(birth, death):
            living_per_year[year] = living_per_year.get(year, 0) + 1
    return living_per_year

Great! But… Hey, there is still a lot of room for errors here!

  • What if you keep something close to the original code and use 1 as a default value? Whoops.
  • What if you forget to change the augmented assignment (+=) with the simple assignment (=)? Crash, Whoops again.

So…

Step 2, remove the dict.get() call

You want to keep the content of the for loop as simple as possible, to reduce risks. To do that, you can create a dictionary who will automatically create a default value if the required key is not already in the dictionary.

Did you ever heard about collections? It’s a Standard Library Package, meaning it’s installed in Python that contains specials data-structures. And defaultdict? It’s a type of dictionary (in the package collections) that accept a default value factory (i.e. something that create a default value, e.g. a function, a class, or a primitive type like int or list) used when a key is missing.

Let’s see how this can helps you:

def count_living_per_year(population):
    living_per_year = collections.defaultdict(int)
    for birth, death in population:
        for year in range(birth, death):
            living_per_year[year] += 1
    return living_per_year

Hey, it looks like the first version, but without the if! Yes, it does. You add 1 for every year, leaving the rest to Python. Instead of implementing the key presence check yourself, you let Python do that for you. And trust me, it’s way more reliable than any human (not perfect, but “more” is already better!).

Final step, let Python do… Everything

So, now, you have a function that count how many times a specific year appears in this double iteration and store the result in a dict, letting the check of the key presence in the dictionary to Python.

It’s not your responsibility anymore, you don’t have to write, test, read or maintain the code that does that, the Python team take care of that for you (consider sponsoring this team here!).

But… What do you do? You count how many times something appears and want a dictionary at the end? Hum… remember the collections package? Let’s see if there is something else useful inside.

Hum, collections.Counter, seems interesting, right? Take a few minutes to look at the documentation here.

It is a collection where elements are stored as dictionary keys and their counts are stored as dictionary values.

That’s exactly what you are doing. So why are you doing it yourself instead of letting Python doing the job for you?

def count_living_per_year_(population: list[tuple[int, int]]) -> dict[int, int]:
    return collections.Counter(
        year
        for birth, death in population
        for year in range(birth, death)
    )

“Wait, what is this thing, with the two for loops on two lines in a weird way?”

This is called a generator expression. To understand what it is, you need to know what is a generator. In a nutshell, in this case, it’s like a list comprehension, but it doesn’t create a list, just allows iteration over itself for one run (what collections.Counter will do). More information about that here.

“Ok, so… What is a list comprehension?”

Here some reading for you here.

Now that you have learn many new things… Let’s run the test again:

    >>> count_living_per_year([(2000, 2003), (2001, 2002)]) == {2000: 1, 2001: 2, 2002: 1}
    True

Great. It works. And is this new code more complex that the original one? Well, the two for loops is in a different place and might look weird if you aren’t used to list comprehension/generator expression syntax, but now, you do nothing except describing how to iterate over the population. Python is in charge of everything else.

We started here:

def count_living_per_year(population: list[tuple[int, int]]) -> dict[int, int]:
    living_per_year = {}
    for birth, death in population:
        for year in range(birth, death):
            if year in living_per_year:
                living_per_year[year] += 1
            else:
                living_per_year[year] = 1
    return living_per_year

And ended up here:

def count_living_per_year_(population: list[tuple[int, int]]) -> dict[int, int]:
    return collections.Counter(
        year
        for birth, death in population
        for year in range(birth, death)
    )

Conclusion: Step by Step

This was an example of how you can delegate your work to Python and other tiers packages (more-itertools is an interesting one!) in order to reduce the usage of error-prone code, simplify the code, and focus on the task, not its implementation.

Yes, doing that require a great knowledge of Python and its ecosystem. But nothing prevent you to follow that path step-by-step. Start with simple things like .get() methods, then when you get used to this tool, go one step further, etc.

Keep in mind who will be asked to maintain the code too. Juniors? Seniors? Seasoned pythoneers? Try not to be too smart.

“Everyone knows that debugging is twice as hard as writing a program in the first place. So if you’re as clever as you can be when you write it, how will you ever debug it?”

Brian W. Kernighan

If you can, teach and share your knowledge among your peers. If you can’t, try to set the required knowledge just above theirs. By doing so, your code will be easily understood after a quick documentation check, and the code base will thanks you.

Check out those recommended articles:

2 responses to “Make simple things simpler?”

  1. I like short code but actually prefer the Junior version in this case…

    Anyone that don’t understand Python but has knowledge of imperative language can quickly and easily read the Junior version.

    The senior version requires you to learn Python special syntax and look up what collection.Counter() does

    1. The Senior version isn’t better. It’s what seniors usually code when they got experience with Python. Why? Mostly because it’s way easier to rely on tools that are already there to do the job for you. That’s why we usually see enumerate() used over range(len()). The Senior version is considered more “Pythonic”, but if you value other metrics like how easy it can be understood for a non-python developer, then you may prefer the first version. The “junior/senior” comparison is mostly a click-bait to talk about more in-depth subject.

      If you think about it, even the python for-loop syntax is pretty python-specific. Many other languages closer to C don’t have a “for-each” syntax. So we can be extreme and consider that for-loop in the form of for-each is too python specific and prefer for i in range(len()). Same for a dict. Many languages doesn’t have hashmaps, and we can do the job with a preallocated list and use the year as an index. We can go very far in terms of “language agnosticism”, and it’s always a matter of how much. Pythonic code is on the non-agnostic side of the spectrum.

Leave a Reply

Discover more from The Way of Python

Subscribe now to keep reading and get access to the full archive.

Continue reading