I often encounter juniors during training sessions or when I join a new team.
They all have in common a simple, naive way to code. Is that bad in itself? I don’t think so. I prefer the simplicity and efficiency of FastAPI over the convoluted 100% DRY design of DjangoRestFramework. I prefer the naive and efficiency of Python over the complexity of C++, the boilerplate OOP required code of Java or the strange and destabilizing syntax of Lisp.
For seasoned python developers, are you confident when you start a pull-request review and noticed a MetaSomething(type) and a Something(meta=MetaSomething)? I’m not.
So, there is something great with beginners code. Simplicity. But it’s fake simplicity, and there is something better than that: True simplicity.
Fake simplicity
A beginner’s code is simple mostly because it describe everything it does at a very low/granular level. Let’s see an example:
def count_living_per_year(population: list[tuple[int, int]]) -> dict[int, int]:
living_per_year = {}
for birth, death in population:
for year in range(birth, death):
if year in living_per_year:
living_per_year[year] += 1
else:
living_per_year[year] = 1
return living_per_year
This code return the number of person in a population that are alive at the end of a year. You can use this doctest to test the code yourself if you want:
>>> count_living_per_year([(2000, 2003), (2001, 2002)]) == {2000: 1, 2001: 2, 2002: 1}
True
As I said, this function described the algorithm at a very low level, doing everything. In the two nested for loops, it test if a year is in the dictionary, create the key and set the value at 1 if not, increment by one otherwise.
Great, but that’s a fake simplicity. The first downside I can mention is the risk of the writer.
We are humans
I am human, and I do mistakes. And other people too. So when you see a code like this, written by a beginner, there is a LOT of room for mistakes. And a lot of room for mistakes is worse than very little room for mistake, isn’t it?
Risk mitigation
Of course, risk of mistakes can be mitigated with automated testing, but if you want to rely on that only, you need to be sure of:
- Is the test for this feature already done? By who?
- 100% trust in the code coverage (Even 100% doesn’t guaranty that 100% of the code is actually covered, we’ll talk about that in the future),
- 100% branch coverage,
- 100% trust in the test correctness (Hard to reach, are you sure everyone who works on the test didn’t run the code, copy-paste the result as the expected output, without checking if it was correct?).
If you don’t, then you have to review the code, and spend some time/energy to make sure there is nothing wrong. And when you do, I guess that you prefer a code where risk of mistakes are low and where mistakes are easy to spot.
How to make simple things even simpler?
How to make something simple even simpler? Easy: don’t do it. If you don’t do something, it can’t be complicated, right? If you don’t do something, it “cannot not” works. Yes, not doing something yourself remove the risk of failing doing that thing right.
This sound stupid to you? Then read what follow carefully.
Step 1, remove the if statement
In the previous code, an if statement was done to detect if a specific year was already in the dictionary:
if year in living_per_year:
living_per_year[year] += 1
else:
living_per_year[year] = 1
Let’s list what could go wrong here:
- Inverting the condition and the corresponding operation (invert line 2 with line 4), or write
if year not in living_per_yearinstead. - logical operator anti-pattern (
if not year in living_per_yearinstead ofif year not in living_per_year). - Typo in the augmented assignment e.g.
==, or in the value assigned if it’s a new year e.g.living_per_year[year] = 0.
So if you don’t code this if, you can’t make those mistakes (and we all do mistakes, not matter our seniority, we can be tired, have a few seconds of inattention, etc…). But how to get rid of the if statement? By using .get() to get a value, or a default one if the key is not in the dictionary:
def count_living_per_year(population):
living_per_year = {}
for birth, death in population:
for year in range(birth, death):
living_per_year[year] = living_per_year.get(year, 0) + 1
return living_per_year
Great! But… Hey, there is still a lot of room for errors here!
- What if you keep something close to the original code and use 1 as a default value? Whoops.
- What if you forget to change the augmented assignment (
+=) with the simple assignment (=)? Crash, Whoops again.
So…
Step 2, remove the dict.get() call
You want to keep the content of the for loop as simple as possible, to reduce risks. To do that, you can create a dictionary who will automatically create a default value if the required key is not already in the dictionary.
Did you ever heard about collections? It’s a Standard Library Package, meaning it’s installed in Python that contains specials data-structures. And defaultdict? It’s a type of dictionary (in the package collections) that accept a default value factory (i.e. something that create a default value, e.g. a function, a class, or a primitive type like int or list) used when a key is missing.
Let’s see how this can helps you:
def count_living_per_year(population):
living_per_year = collections.defaultdict(int)
for birth, death in population:
for year in range(birth, death):
living_per_year[year] += 1
return living_per_year
Hey, it looks like the first version, but without the if! Yes, it does. You add 1 for every year, leaving the rest to Python. Instead of implementing the key presence check yourself, you let Python do that for you. And trust me, it’s way more reliable than any human (not perfect, but “more” is already better!).
Final step, let Python do… Everything
So, now, you have a function that count how many times a specific year appears in this double iteration and store the result in a dict, letting the check of the key presence in the dictionary to Python.
It’s not your responsibility anymore, you don’t have to write, test, read or maintain the code that does that, the Python team take care of that for you (consider sponsoring this team here!).
But… What do you do? You count how many times something appears and want a dictionary at the end? Hum… remember the collections package? Let’s see if there is something else useful inside.
Hum, collections.Counter, seems interesting, right? Take a few minutes to look at the documentation here.
It is a collection where elements are stored as dictionary keys and their counts are stored as dictionary values.
That’s exactly what you are doing. So why are you doing it yourself instead of letting Python doing the job for you?
def count_living_per_year_(population: list[tuple[int, int]]) -> dict[int, int]:
return collections.Counter(
year
for birth, death in population
for year in range(birth, death)
)
“Wait, what is this thing, with the two for loops on two lines in a weird way?”
This is called a generator expression. To understand what it is, you need to know what is a generator. In a nutshell, in this case, it’s like a list comprehension, but it doesn’t create a list, just allows iteration over itself for one run (what collections.Counter will do). More information about that here.
“Ok, so… What is a list comprehension?”
Here some reading for you here.
Now that you have learn many new things… Let’s run the test again:
>>> count_living_per_year([(2000, 2003), (2001, 2002)]) == {2000: 1, 2001: 2, 2002: 1}
True
Great. It works. And is this new code more complex that the original one? Well, the two for loops is in a different place and might look weird if you aren’t used to list comprehension/generator expression syntax, but now, you do nothing except describing how to iterate over the population. Python is in charge of everything else.
We started here:
def count_living_per_year(population: list[tuple[int, int]]) -> dict[int, int]:
living_per_year = {}
for birth, death in population:
for year in range(birth, death):
if year in living_per_year:
living_per_year[year] += 1
else:
living_per_year[year] = 1
return living_per_year
And ended up here:
def count_living_per_year_(population: list[tuple[int, int]]) -> dict[int, int]:
return collections.Counter(
year
for birth, death in population
for year in range(birth, death)
)
Conclusion: Step by Step
This was an example of how you can delegate your work to Python and other tiers packages (more-itertools is an interesting one!) in order to reduce the usage of error-prone code, simplify the code, and focus on the task, not its implementation.
Yes, doing that require a great knowledge of Python and its ecosystem. But nothing prevent you to follow that path step-by-step. Start with simple things like .get() methods, then when you get used to this tool, go one step further, etc.
Keep in mind who will be asked to maintain the code too. Juniors? Seniors? Seasoned pythoneers? Try not to be too smart.
“Everyone knows that debugging is twice as hard as writing a program in the first place. So if you’re as clever as you can be when you write it, how will you ever debug it?”
Brian W. Kernighan
If you can, teach and share your knowledge among your peers. If you can’t, try to set the required knowledge just above theirs. By doing so, your code will be easily understood after a quick documentation check, and the code base will thanks you.
Check out those recommended articles:
![# Junior def count_living_per_year(population): living_per_year = {} for birth, death in population: for year in range(birth, death): if year in living_per_year: living_per_year[year] += 1 else: living_per_year[year] = 1 return living_per_year # Senior def count_living_per_year(population): return collections.Counter( year for birth, death in population for year in range(birth, death) )](https://thewayofpython.com/wp-content/uploads/2023/05/counter.png)
Leave a Reply to Dorian TURBACancel reply