1
0
mirror of https://github.com/jbranchaud/til synced 2026-03-03 22:48:45 +00:00
Files
til/python/easy-key-value-aggregates-with-defaultdict.md

1.5 KiB

Easy Key-Value Aggregates With defaultdict

The collections module has the defaultdict object that can be used to aggregate values tied to a key. What sets this apart from simply using a dict is that we get the base value for free. So if our aggregate value is a list, then we get [] by default for each new key. In the same way, we'd get 0 if it was constructed with int.

Here is the counter example from Keep A Tally With collections.Counter

from collections import defaultdict

def get_pair_counts(token_ids: list[int]) -> Counter:
    """Count how often each adjacent pair appears"""
    counts = defaultdict(int)
    for i in range(len(token_ids) - 1):
        pair = (token_ids[i], token_ids[i + 1])
        counts[pair] += 1
    return counts

We never have to initially set a key to 0. If the key is not yet present, then int() (the zero-value constructor) is used as the __missing__ value.

We can do the same with list:

>>> import collections
>>> stuff = collections.defaultdict(list)
>>> stuff['alpha'].append(1)
>>> stuff['alpha']
[1]
>>> stuff['beta']
[]

In the same way, this uses list() as the __missing__ value to start of each key with an [].

I find this so handy because in other languages I've typically had to do something more like this:

words_by_length = {}
for item in items:
    if len(item) not in words_by_length:
        words_by_length[len(item)] = []
    words_by_length[len(item)].append(item)

This is much clunkier.