Learn Python with Rust: Counting String Occurrences
Let’s learn Python by looking at another language, Rust. Here is a Python pedagogical simplistic example that counts string occurrences in a list (could be done using Counter, I know), and the Rust counterpart (I know, could be done using into_iter and fold).
# Python
def foo(spam: list[str]) -> dict[str, int]:
counter = {}
for item in spam:
counter[item] = counter.setdefault(item, 0) + 1
return counter
// Rust
fn foo(spam: Vec<String>) -> HashMap<String, i32> {
let mut counter: HashMap<String, i32> = HashMap::new();
for item in spam {
*counter
.entry(item)
.or_insert(0) += 1;
}
counter
}
Data Structures: Dict and HashMap
First, we define a dict in Python or a HashMap in Rust. Python dict can take any type of value by default, but both structures are hashmaps, meaning they bind a hash to a value. The value can be of any kind, but the key must be hashable (implement a __hash__ magic method). Immutable types are usually hashable, but not always. For example, a tuple containing a list cannot be hashed (here is why).
In Rust, the HashMap must be declared as mut, meaning mutable. Python dict are mutable by default. To create an immutable dict, you can use a mapping proxy (or the tier library frozendict):
import types
immutable_dict = types.MappingProxyType({'a': 1, 'b': 2})
Code Comparison
Rust Code
counter.entry(item).or_insert(0) accesses the entry associated with the key item, and if it’s vacant inserts 0 into the entry and returns it. This is the important part: It change the value of the entry and return this entry at the same time.
In order to update the value, we use the wildcard * and += 1. This will access and mutate the entry, adding 1 to the current count.
Python Code
The Python equivalent, counter.setdefault(item, 0), returns the value associated to the key if it exists, and If the key doesn’t exist in the dict (meaning there is no value), it set it to the default value, here 0 and returns it.
However, the result is a simple integer, not a reference to the dict value. Therefore, we need to increment the value explicitly with counter[item] = counter.get(item, 0) + 1.
Mutable vs Immutable Data Structures
In Rust, the or_insert call on a HashMap entry, give us a mutable reference to the entry inside the HashMap. Therefore the mutable state allow us to += its value, directly altering the HashMap. To do that we need * to dereference the mutable and get access to it. In Python, .setdefault returns a value, not a reference, so we must write it back to the dict with counter[item] =.
Hypothetical Python Enhanced Syntax
To achieve something similar to Rust in Python, we might want syntax like using a walrus operator:
(counter[item] := counter.get(item, 0)) += 1
Unfortunately, this isn’t valid Python syntax, although It is possible by altering the bytecode of a similar function (see codereview.stackexchange.com).
Enhanced versions
More pythonic code
To be clear, the python code showed previously in this article is not what you should aim for as It lack simplicity and conciseness. Here is a few improvement you might want to consider if you are looking for a more pythonic code.
Use collections.defaultdict instead of dict
The collections.defaultdict object allows you to define a default factory of value for each missing key that could be asked from the dict. By defining this default factory to int, the dict will generate a new int each time the key is missing. Then, the key is returned, effectively doing in valid Python what the hypothetical implementation using the walrus would do. Of course, it doesn’t looks like the Rust version at all anymore.
import collections
def foo(spam: list[str]) -> dict[str, int]:
counter = collections.defaultdict(int)
for item in spam:
counter[item] += 1
return counter
Use collections.Counter when you count something
What if we use something called Counter in the collections module? It should count stuff, right? Let’s change the counter definition first.
import collections
def foo(spam: list[str]) -> dict[str, int]:
counter = collections.Counter()
for item in spam:
counter[item] += 1
return counter
And it works. But that wouldn’t be very useful, isn’t it? I mean, we still have to perform a for loop and increment our-self. A object named “Counter” should be able to count stuff without our help, right? Well, it does.
def foo(spam: list[str]) -> dict[str, int]:
return collections.Counter(spam)
Yes, now the function is pretty much a useless and confusing abstraction over Counter, but you get the point. Count stuff in python, think about collections.Counter!
More rustonic code
Same as the python code, there is better implementation to perform that task in rust, slightly more complex to do in Python, that’s why I didn’t used it in this article, but we’ll see those implementation now.
Use of Entry.and_modify() when modifying something
You cannot get a reference to an Entry that could be vacant as the compiler will not compile, simply because without making sure an entry is Occupied, there is no guaranty that the reference of the value could be acquired (the Occupied.get() method allows you to get a ref to the value). So, you need to perform a Entry.or_insert() before the +=.
But, you want to perform the += only when there is already a value to increment, right? Let’s change our code to use Entry.and_modify() then.
fn foo(spam: Vec<String>) -> HashMap<String, i32> {
let mut counter: HashMap<String, i32> = HashMap::new();
for item in spam {
counter
.entry(item)
.and_modify(|counter| *counter += 1)
.or_insert(1);
}
counter
}
Doing so, the increment will only be performed when the value already exists in the HashMap, and if it doesn’t, the inserted value is set to 1, since we don’t perform the increment after the insertion.
No more loop, let’s go functional
If you are used to functools.reduce() in Python, the following won’t be difficult to understand. If you are not, I advise you to look into it before.
So, what do we do with our code? We want to do something for each value of the list of strings. Each time we perform this “something”, we take the result and give it back to the next operation. Sound pretty familiar, right? Let’s write it in Python first.
def foo(spam: list[str]) -> dict[str, int]:
return functools.reduce(
lambda counter, s:
counter | {s: counter.get(s, 0) + 1},
spam,
{},
)
The Rust functional implementation is slightly different, let’s see how it would hypothetically looks like in Python.
def foo(spam: list[str]) -> dict[str, int]:
return spam.reduce(
lambda counter, s:
counter | {s: counter.get(s, 0) + 1},
{},
)
Of course, this will crash because str.reduce doesn’t exists, but it could make sense, right?
This str.reduce signature would looks like this:
def reduce(self: Iterable[_S],
function: (_T, _S) -> _T,
initial: _T) -> _T
Let’s alter it a little bit more (still won’t be functional code).
def foo(spam: list[str]) -> dict[str, int]:
return iter(spam).fold(
{},
lambda counter, s:
counter | {s: counter.get(s, 0) + 1},
)
Now, we first convert the string is a more generic object, an iterator, that could hypothetically have a function named “fold” that would simply be an alternative reduce where the initial value is the first parameter instead of the last one.
The signature of this hypothetical Iterator.fold function would looks like this:
def fold(self: Iterator[_S],
initial: _T,
function: (_T, _S) -> _T) -> _T
You start to see where I’m going? Let’s look at the Rust implementation.
fn foo(spam: Vec<String>) -> HashMap<String, i32> {
spam
.into_iter()
.fold(
HashMap::new(),
|mut counter, item| {
counter
.entry(item)
.and_modify(|counter| *counter += 1)
.or_insert(1);
counter
},
)
}
Yes, this behave exactly like our fictional python code. Easy, right? Maybe not. Let’s see.
spam.into_iter()is roughly equivalent toiter(spam)- .fold() is equivalent to the fictitious fold method we talked before.
|mut counter, item| {...}is called a closure and this is roughly equivalent to our lambdas.
This article was reviewed and improved by following reviewers, thank you
Antony Bara, Maxim Danilov.

Leave a Reply