diff --git a/README.md b/README.md index fb7e992..e2cdbcf 100644 --- a/README.md +++ b/README.md @@ -10,7 +10,7 @@ working across different projects via [VisualMode](https://www.visualmode.dev/). For a steady stream of TILs, [sign up for my newsletter](https://visualmode.kit.com/newsletter). -_1744 TILs and counting..._ +_1745 TILs and counting..._ See some of the other learning resources I work on: @@ -1039,6 +1039,7 @@ If you've learned something here, support my efforts writing daily TILs by - [Dunder Methods](python/dunder-methods.md) - [Install With PIP For Specific Interpreter](python/install-with-pip-for-specific-interpreter.md) - [Iterate First N Items From Enumerable](python/iterate-first-n-items-from-enumerable.md) +- [Keep A Tally With collections.Counter](python/keep-a-tally-with-collections-counter.md) - [Load A File Into The Python REPL](python/load-a-file-into-the-python-repl.md) - [Override The Boolean Context Of A Class](python/override-the-boolean-context-of-a-class.md) - [Store And Access Immutable Data In A Tuple](python/store-and-access-immutable-data-in-a-tuple.md) diff --git a/python/keep-a-tally-with-collections-counter.md b/python/keep-a-tally-with-collections-counter.md new file mode 100644 index 0000000..3631cd2 --- /dev/null +++ b/python/keep-a-tally-with-collections-counter.md @@ -0,0 +1,40 @@ +# Keep A Tally With collections.Counter + +Python's `collections` module comes with a +[`Counter`](https://docs.python.org/3/library/collections.html#collections.Counter) +object which is a specialized dict subclass focussed on tallying counts of keys. + +> It is a collection where elements are stored as dictionary keys and their +> counts are stored as dictionary values. Counts are allowed to be any integer +> value including zero or negative counts. + +I used it recently while doing an exploratory implementation of a Byte-Pair +Encoding (BPE): + +```python +from collections import Counter + +def get_pair_counts(token_ids: list[int]) -> Counter: + """Count how often each adjacent pair appears""" + counts = Counter() + for i in range(len(token_ids) - 1): + pair = (token_ids[i], token_ids[i + 1]) + counts[pair] += 1 + return counts +``` + +Here I'm able to count the number of occurrences of each pair of bytes from the +input text. A tuple of `int` values is hashable, so they work great as keys for +a `Counter`. + +The count value of any key will default to `0`. That makes it straightforward to +increment from there as you iterating over occurrences. + +```python +>>> counts = Counter() +>>> counts['hello'] +0 +>>> count['hello'] += 1 +>>> count['hello'] +1 +```