diff --git a/README.md b/README.md index e566fd7..fab56fc 100644 --- a/README.md +++ b/README.md @@ -10,7 +10,7 @@ working across different projects via [VisualMode](https://www.visualmode.dev/). For a steady stream of TILs, [sign up for my newsletter](https://visualmode.kit.com/newsletter). -_1763 TILs and counting..._ +_1764 TILs and counting..._ See some of the other learning resources I work on: @@ -1063,6 +1063,7 @@ If you've learned something here, support my efforts writing daily TILs by - [Store And Access Immutable Data In A Tuple](python/store-and-access-immutable-data-in-a-tuple.md) - [Test A Function With Pytest](python/test-a-function-with-pytest.md) - [Use pipx To Install End User Apps](python/use-pipx-to-install-end-user-apps.md) +- [Use `__post_init__` For `dataclass` Validations](python/use-post-init-for-dataclass-validations.md) - [Use Verbose Flag To Get More Diff](python/use-verbose-flag-to-get-more-diff.md) ### Rails diff --git a/python/use-post-init-for-dataclass-validations.md b/python/use-post-init-for-dataclass-validations.md new file mode 100644 index 0000000..c56f91c --- /dev/null +++ b/python/use-post-init-for-dataclass-validations.md @@ -0,0 +1,54 @@ +# Use `__post_init__` For `dataclass` Validations + +The [`dataclass`](https://docs.python.org/3/library/dataclasses.html) construct +is a handy stdlib way of modeling some data with many improvements over a `dict` +such as named attributes and type visibility. + +```python +from dataclasses import dataclass +from typing import ClassVar + +@dataclass +class BPEConfig: + BASE_VOCAB_SIZE: ClassVar[int] = 256 + + vocab_size: int + special_tokens: list[str] +``` + +I want to enhance `BPEConfig` a little by validating the `vocab_size` which +cannot be less than the `BASE_VOCAB_SIZE`. The +[`__post_init__`](https://docs.python.org/3/library/dataclasses.html#dataclasses.__post_init__) +method is a good place for this kind of validation. + +```python +from dataclasses import dataclass +from typing import ClassVar + +@dataclass +class BPEConfig: + BASE_VOCAB_SIZE: ClassVar[int] = 256 + + vocab_size: int + special_tokens: list[str] + + def __post_init__(self): + if self.vocab_size < self.BASE_VOCAB_SIZE: + msg = f"vocab_size ({self.vocab_size}) must be greater than or equal to BASE_VOCAB_SIZE ({self.BASE_VOCAB_SIZE})" + raise ValueError(msg) +``` + +With this in place, my program will fail fast if I try to use an invalid +`vocab_size`: + +```python +>>> BPEConfig(22, []) +Traceback (most recent call last): + File "", line 1, in + File "", line 5, in __init__ + File "/Users/lastword/dev/misc/build-an-llm/chapter_02/bpe_tokenizer.py", line 24, in __post_init__ + raise ValueError(msg) +ValueError: vocab_size (22) must be greater than or equal to BASE_VOCAB_SIZE (256) +``` + +This example is pulled directly from [the `BPETokenizer` I'm building](https://github.com/jbranchaud/build-an-llm-from-scratch/blob/d3fd0acd65c3e7419b2d15a64c8d74266d0488f6/chapter_02/bpe_tokenizer.py#L14-L24).