1
0
mirror of https://github.com/jbranchaud/til synced 2026-07-03 08:08:24 +00:00

Add Use __post_init__ For dataclass Validations as a Python TIL

This commit is contained in:
jbranchaud
2026-03-22 14:37:59 -05:00
parent eb0a7e1b3d
commit 8af252f232
2 changed files with 56 additions and 1 deletions
+2 -1
View File
@@ -10,7 +10,7 @@ working across different projects via [VisualMode](https://www.visualmode.dev/).
For a steady stream of TILs, [sign up for my newsletter](https://visualmode.kit.com/newsletter).
_1763 TILs and counting..._
_1764 TILs and counting..._
See some of the other learning resources I work on:
@@ -1063,6 +1063,7 @@ If you've learned something here, support my efforts writing daily TILs by
- [Store And Access Immutable Data In A Tuple](python/store-and-access-immutable-data-in-a-tuple.md)
- [Test A Function With Pytest](python/test-a-function-with-pytest.md)
- [Use pipx To Install End User Apps](python/use-pipx-to-install-end-user-apps.md)
- [Use `__post_init__` For `dataclass` Validations](python/use-post-init-for-dataclass-validations.md)
- [Use Verbose Flag To Get More Diff](python/use-verbose-flag-to-get-more-diff.md)
### Rails
@@ -0,0 +1,54 @@
# Use `__post_init__` For `dataclass` Validations
The [`dataclass`](https://docs.python.org/3/library/dataclasses.html) construct
is a handy stdlib way of modeling some data with many improvements over a `dict`
such as named attributes and type visibility.
```python
from dataclasses import dataclass
from typing import ClassVar
@dataclass
class BPEConfig:
BASE_VOCAB_SIZE: ClassVar[int] = 256
vocab_size: int
special_tokens: list[str]
```
I want to enhance `BPEConfig` a little by validating the `vocab_size` which
cannot be less than the `BASE_VOCAB_SIZE`. The
[`__post_init__`](https://docs.python.org/3/library/dataclasses.html#dataclasses.__post_init__)
method is a good place for this kind of validation.
```python
from dataclasses import dataclass
from typing import ClassVar
@dataclass
class BPEConfig:
BASE_VOCAB_SIZE: ClassVar[int] = 256
vocab_size: int
special_tokens: list[str]
def __post_init__(self):
if self.vocab_size < self.BASE_VOCAB_SIZE:
msg = f"vocab_size ({self.vocab_size}) must be greater than or equal to BASE_VOCAB_SIZE ({self.BASE_VOCAB_SIZE})"
raise ValueError(msg)
```
With this in place, my program will fail fast if I try to use an invalid
`vocab_size`:
```python
>>> BPEConfig(22, [])
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "<string>", line 5, in __init__
File "/Users/lastword/dev/misc/build-an-llm/chapter_02/bpe_tokenizer.py", line 24, in __post_init__
raise ValueError(msg)
ValueError: vocab_size (22) must be greater than or equal to BASE_VOCAB_SIZE (256)
```
This example is pulled directly from [the `BPETokenizer` I'm building](https://github.com/jbranchaud/build-an-llm-from-scratch/blob/d3fd0acd65c3e7419b2d15a64c8d74266d0488f6/chapter_02/bpe_tokenizer.py#L14-L24).