From f186d5977dbcc6aafaf7f523e79f085319a5ecd9 Mon Sep 17 00:00:00 2001 From: jbranchaud Date: Mon, 2 Feb 2026 16:55:50 -0600 Subject: [PATCH] Add Compute Median Instead of Average as a PostgreSQL TIL --- README.md | 3 +- postgres/compute-median-instead-of-average.md | 44 +++++++++++++++++++ 2 files changed, 46 insertions(+), 1 deletion(-) create mode 100644 postgres/compute-median-instead-of-average.md diff --git a/README.md b/README.md index b4cfbc4..3d969a2 100644 --- a/README.md +++ b/README.md @@ -10,7 +10,7 @@ working across different projects via [VisualMode](https://www.visualmode.dev/). For a steady stream of TILs, [sign up for my newsletter](https://visualmode.kit.com/newsletter). -_1735 TILs and counting..._ +_1736 TILs and counting..._ See some of the other learning resources I work on: @@ -863,6 +863,7 @@ If you've learned something here, support my efforts writing daily TILs by - [Clear The Screen In psql](postgres/clear-the-screen-in-psql.md) - [Clear The Screen In psql (2)](postgres/clear-the-screen-in-psql-2.md) - [Compute Hashes With pgcrypto](postgres/compute-hashes-with-pgcrypto.md) +- [Compute Median Instead Of Average](postgres/compute-median-instead-of-average.md) - [Compute The Levenshtein Distance Of Two Strings](postgres/compute-the-levenshtein-distance-of-two-strings.md) - [Compute The md5 Hash Of A String](postgres/compute-the-md5-hash-of-a-string.md) - [Concatenate Strings With A Separator](postgres/concatenate-strings-with-a-separator.md) diff --git a/postgres/compute-median-instead-of-average.md b/postgres/compute-median-instead-of-average.md new file mode 100644 index 0000000..1e8ebe5 --- /dev/null +++ b/postgres/compute-median-instead-of-average.md @@ -0,0 +1,44 @@ +# Compute Median Instead Of Average + +One of the first aggregate functions we might use in PostgreSQL, besides `sum`, +is `avg`. + +```sql +select avg(book_count) as average_books_read +from ( + select users.id, count(books.id) as book_count + from users + left join books + on books.user_id = users.id + where books.read_in_year = 2025 + group by users.id +) as user_book_counts; +``` + +This computes the average of the set of values which sums them all up +and divides by the count. The average (maybe you've heard this also called the +_mean_) is not always the best way to understand data, especially when there are +outliers. + +Instead, we might want to compute the _median_ value of our set of data. There +is no easily identifiable `median` aggregate function. Instead, we can use +`percentile_cont` with a value of `0.5`. This gets us the 50th percentile of our +set of data which is the definition of the _median_. + +```sql +select percentile_cont(0.5) within group ( + order by book_count +) as median_books_read +from ( + select users.id, count(books.id) as book_count + from users + left join books on books.user_id = users.id and books.read_in_year = 2025 + group by users.id +) as user_book_counts; +``` + +The full syntax for `percentile_cont` is `percentile_cong(precision) within +group (order by ...)` because this is an aggregiate that has to work with an +ordered-set of data. + +[source](https://www.postgresql.org/docs/current/functions-aggregate.html)