diff --git a/README.md b/README.md index 8f03574..2a5a699 100644 --- a/README.md +++ b/README.md @@ -10,7 +10,7 @@ working across different projects via [VisualMode](https://www.visualmode.dev/). For a steady stream of TILs, [sign up for my newsletter](https://visualmode.kit.com/newsletter). -_1748 TILs and counting..._ +_1749 TILs and counting..._ See some of the other learning resources I work on: @@ -1044,6 +1044,7 @@ If you've learned something here, support my efforts writing daily TILs by - [Check If Package Is Installed With Pip](python/check-if-package-is-installed-with-pip.md) - [Create A Dummy DataFrame In Pandas](python/create-a-dummy-dataframe-in-pandas.md) - [Dunder Methods](python/dunder-methods.md) +- [Easy Key-Value Aggregates With defaultdict](python/easy-key-value-aggregates-with-defaultdict.md) - [Install With PIP For Specific Interpreter](python/install-with-pip-for-specific-interpreter.md) - [Iterate First N Items From Enumerable](python/iterate-first-n-items-from-enumerable.md) - [Keep A Tally With collections.Counter](python/keep-a-tally-with-collections-counter.md) diff --git a/python/easy-key-value-aggregates-with-defaultdict.md b/python/easy-key-value-aggregates-with-defaultdict.md new file mode 100644 index 0000000..b007e35 --- /dev/null +++ b/python/easy-key-value-aggregates-with-defaultdict.md @@ -0,0 +1,53 @@ +# Easy Key-Value Aggregates With defaultdict + +The `collections` module has the `defaultdict` object that can be used to +aggregate values tied to a key. What sets this apart from simply using a `dict` +is that we get the base value for free. So if our aggregate value is a list, +then we get `[]` by default for each new key. In the same way, we'd get `0` if +it was constructed with `int`. + +Here is the counter example from [Keep A Tally With +collections.Counter](keep-a-tally-with-collections-counter.md) + +```python +from collections import defaultdict + +def get_pair_counts(token_ids: list[int]) -> Counter: + """Count how often each adjacent pair appears""" + counts = defaultdict(int) + for i in range(len(token_ids) - 1): + pair = (token_ids[i], token_ids[i + 1]) + counts[pair] += 1 + return counts +``` + +We never have to initially set a key to `0`. If the key is not yet present, then +`int()` (the zero-value constructor) is used as the `__missing__` value. + +We can do the same with `list`: + +```python +>>> import collections +>>> stuff = collections.defaultdict(list) +>>> stuff['alpha'].append(1) +>>> stuff['alpha'] +[1] +>>> stuff['beta'] +[] +``` + +In the same way, this uses `list()` as the `__missing__` value to start of each +key with an `[]`. + +I find this so handy because in other languages I've typically had to do +something more like this: + +```python +words_by_length = {} +for item in items: + if len(item) not in words_by_length: + words_by_length[len(item)] = [] + words_by_length[len(item)].append(item) +``` + +This is much clunkier.