1
0
mirror of https://github.com/jbranchaud/til synced 2026-03-07 00:18:46 +00:00
Files
til/python/use-verbose-flag-to-get-more-diff.md

162 lines
6.6 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# Use Verbose Flag To Get More Diff
Here is the output of running some `pytest` unit tests. A couple of the tests
pass, which produces little output. But I get a big block of details for the one
failing test. In this case the failure is an assertion between two lists that
don't match.
```bash
uv run pytest
========================================== test session starts ==========================================
platform darwin -- Python 3.12.12, pytest-9.0.2, pluggy-1.6.0
rootdir: /Users/lastword/dev/misc/build-an-llm
configfile: pyproject.toml
collected 3 items
tests/chapter_02/test_bpe_tokenizer.py .F. [100%]
=============================================== FAILURES ================================================
_____________________________________ test_merge_with_byte_sequence _____________________________________
def test_merge_with_byte_sequence():
token_ids = [1, 2, 3, 4, 5, 2, 3, 1, 2, 3, 4, 1]
merged_tokens = BPETokenizer._merge(token_ids, [2, 3, 4], 256)
# assert merged_tokens == [1, 256, 5, 2, 3, 1, 256, 1]
> assert merged_tokens == [1, 256, 5, 4, 5, 1, 256, 1]
E assert [1, 256, 5, 2, 3, 1, ...] == [1, 256, 5, 4, 5, 1, ...]
E
E At index 3 diff: 2 != 4
E Use -v to get more diff
tests/chapter_02/test_bpe_tokenizer.py:13: AssertionError
======================================== short test summary info ========================================
FAILED tests/chapter_02/test_bpe_tokenizer.py::test_merge_with_byte_sequence - assert [1, 256, 5, 2, 3, 1, ...] == [1, 256, 5, 4, 5, 1, ...]
====================================== 1 failed, 2 passed in 0.02s ======================================
```
The lists are too long to fully display in the failure output. `pytest` is able
to tell us two useful things though. First, it mentions that the first
discrepancy in the lists is at index `3` where `2 != 4`. Second, it says `Use -v
to get more diff`.
Let's try rerunning the tests with `-v`.
```bash
uv run pytest -v
========================================== test session starts ==========================================
platform darwin -- Python 3.12.12, pytest-9.0.2, pluggy-1.6.0 -- /Users/lastword/dev/misc/build-an-llm/.venv/bin/python3
cachedir: .pytest_cache
rootdir: /Users/lastword/dev/misc/build-an-llm
configfile: pyproject.toml
collected 3 items
tests/chapter_02/test_bpe_tokenizer.py::test_merge_with_byte_pair PASSED [ 33%]
tests/chapter_02/test_bpe_tokenizer.py::test_merge_with_byte_sequence FAILED [ 66%]
tests/chapter_02/test_bpe_tokenizer.py::test_subsequence_at_index PASSED [100%]
=============================================== FAILURES ================================================
_____________________________________ test_merge_with_byte_sequence _____________________________________
def test_merge_with_byte_sequence():
token_ids = [1, 2, 3, 4, 5, 2, 3, 1, 2, 3, 4, 1]
merged_tokens = BPETokenizer._merge(token_ids, [2, 3, 4], 256)
# assert merged_tokens == [1, 256, 5, 2, 3, 1, 256, 1]
> assert merged_tokens == [1, 256, 5, 4, 5, 1, 256, 1]
E AssertionError: assert [1, 256, 5, 2, 3, 1, ...] == [1, 256, 5, 4, 5, 1, ...]
E
E At index 3 diff: 2 != 4
E
E Full diff:
E [
E 1,
E 256,...
E
E ...Full output truncated (13 lines hidden), use '-vv' to show
tests/chapter_02/test_bpe_tokenizer.py:13: AssertionError
======================================== short test summary info ========================================
FAILED tests/chapter_02/test_bpe_tokenizer.py::test_merge_with_byte_sequence - AssertionError: assert [1, 256, 5, 2, 3, 1, ...] == [1, 256, 5, 4, 5, 1, ...]
====================================== 1 failed, 2 passed in 0.02s ======================================
```
That was sort of a tease because it starts to display a "Full diff", but that
gets quickly truncated. `pytest` then tells us that we can `use '-vv' to show`
the full diff.
```bash
uv run pytest -vv
========================================== test session starts ==========================================
platform darwin -- Python 3.12.12, pytest-9.0.2, pluggy-1.6.0 -- /Users/lastword/dev/misc/build-an-llm/.venv/bin/python3
cachedir: .pytest_cache
rootdir: /Users/lastword/dev/misc/build-an-llm
configfile: pyproject.toml
collected 3 items
tests/chapter_02/test_bpe_tokenizer.py::test_merge_with_byte_pair PASSED [ 33%]
tests/chapter_02/test_bpe_tokenizer.py::test_merge_with_byte_sequence FAILED [ 66%]
tests/chapter_02/test_bpe_tokenizer.py::test_subsequence_at_index PASSED [100%]
=============================================== FAILURES ================================================
_____________________________________ test_merge_with_byte_sequence _____________________________________
def test_merge_with_byte_sequence():
token_ids = [1, 2, 3, 4, 5, 2, 3, 1, 2, 3, 4, 1]
merged_tokens = BPETokenizer._merge(token_ids, [2, 3, 4], 256)
# assert merged_tokens == [1, 256, 5, 2, 3, 1, 256, 1]
> assert merged_tokens == [1, 256, 5, 4, 5, 1, 256, 1]
E assert [1, 256, 5, 2, 3, 1, 256, 1] == [1, 256, 5, 4, 5, 1, 256, 1]
E
E At index 3 diff: 2 != 4
E
E Full diff:
E [
E 1,
E 256,
E 5,
E - 4,
E ? ^
E + 2,
E ? ^
E - 5,
E ? ^
E + 3,
E ? ^
E 1,
E 256,
E 1,
E ]
tests/chapter_02/test_bpe_tokenizer.py:13: AssertionError
======================================== short test summary info ========================================
FAILED tests/chapter_02/test_bpe_tokenizer.py::test_merge_with_byte_sequence - assert [1, 256, 5, 2, 3, 1, 256, 1] == [1, 256, 5, 4, 5, 1, 256, 1]
At index 3 diff: 2 != 4
Full diff:
[
1,
256,
5,
- 4,
? ^
+ 2,
? ^
- 5,
? ^
+ 3,
? ^
1,
256,
1,
]
====================================== 1 failed, 2 passed in 0.02s ======================================
```
This is a lot more output to look at. What we can perhaps see more clearly now
is that the lists match up until there is a mismatch between `2` and `4` at the
third index. And then right after that is another mismatch between `3` and `5`.
This kind of output can only scale so much, so use it when it works and when the
diff view starts to fall short, rework the assertions to get more readable and
actionable test output.