mirror of
https://github.com/jbranchaud/til
synced 2026-03-04 06:58:45 +00:00
Add Compute Median Instead of Average as a PostgreSQL TIL
This commit is contained in:
@@ -10,7 +10,7 @@ working across different projects via [VisualMode](https://www.visualmode.dev/).
|
|||||||
|
|
||||||
For a steady stream of TILs, [sign up for my newsletter](https://visualmode.kit.com/newsletter).
|
For a steady stream of TILs, [sign up for my newsletter](https://visualmode.kit.com/newsletter).
|
||||||
|
|
||||||
_1735 TILs and counting..._
|
_1736 TILs and counting..._
|
||||||
|
|
||||||
See some of the other learning resources I work on:
|
See some of the other learning resources I work on:
|
||||||
|
|
||||||
@@ -863,6 +863,7 @@ If you've learned something here, support my efforts writing daily TILs by
|
|||||||
- [Clear The Screen In psql](postgres/clear-the-screen-in-psql.md)
|
- [Clear The Screen In psql](postgres/clear-the-screen-in-psql.md)
|
||||||
- [Clear The Screen In psql (2)](postgres/clear-the-screen-in-psql-2.md)
|
- [Clear The Screen In psql (2)](postgres/clear-the-screen-in-psql-2.md)
|
||||||
- [Compute Hashes With pgcrypto](postgres/compute-hashes-with-pgcrypto.md)
|
- [Compute Hashes With pgcrypto](postgres/compute-hashes-with-pgcrypto.md)
|
||||||
|
- [Compute Median Instead Of Average](postgres/compute-median-instead-of-average.md)
|
||||||
- [Compute The Levenshtein Distance Of Two Strings](postgres/compute-the-levenshtein-distance-of-two-strings.md)
|
- [Compute The Levenshtein Distance Of Two Strings](postgres/compute-the-levenshtein-distance-of-two-strings.md)
|
||||||
- [Compute The md5 Hash Of A String](postgres/compute-the-md5-hash-of-a-string.md)
|
- [Compute The md5 Hash Of A String](postgres/compute-the-md5-hash-of-a-string.md)
|
||||||
- [Concatenate Strings With A Separator](postgres/concatenate-strings-with-a-separator.md)
|
- [Concatenate Strings With A Separator](postgres/concatenate-strings-with-a-separator.md)
|
||||||
|
|||||||
44
postgres/compute-median-instead-of-average.md
Normal file
44
postgres/compute-median-instead-of-average.md
Normal file
@@ -0,0 +1,44 @@
|
|||||||
|
# Compute Median Instead Of Average
|
||||||
|
|
||||||
|
One of the first aggregate functions we might use in PostgreSQL, besides `sum`,
|
||||||
|
is `avg`.
|
||||||
|
|
||||||
|
```sql
|
||||||
|
select avg(book_count) as average_books_read
|
||||||
|
from (
|
||||||
|
select users.id, count(books.id) as book_count
|
||||||
|
from users
|
||||||
|
left join books
|
||||||
|
on books.user_id = users.id
|
||||||
|
where books.read_in_year = 2025
|
||||||
|
group by users.id
|
||||||
|
) as user_book_counts;
|
||||||
|
```
|
||||||
|
|
||||||
|
This computes the average of the set of values which sums them all up
|
||||||
|
and divides by the count. The average (maybe you've heard this also called the
|
||||||
|
_mean_) is not always the best way to understand data, especially when there are
|
||||||
|
outliers.
|
||||||
|
|
||||||
|
Instead, we might want to compute the _median_ value of our set of data. There
|
||||||
|
is no easily identifiable `median` aggregate function. Instead, we can use
|
||||||
|
`percentile_cont` with a value of `0.5`. This gets us the 50th percentile of our
|
||||||
|
set of data which is the definition of the _median_.
|
||||||
|
|
||||||
|
```sql
|
||||||
|
select percentile_cont(0.5) within group (
|
||||||
|
order by book_count
|
||||||
|
) as median_books_read
|
||||||
|
from (
|
||||||
|
select users.id, count(books.id) as book_count
|
||||||
|
from users
|
||||||
|
left join books on books.user_id = users.id and books.read_in_year = 2025
|
||||||
|
group by users.id
|
||||||
|
) as user_book_counts;
|
||||||
|
```
|
||||||
|
|
||||||
|
The full syntax for `percentile_cont` is `percentile_cong(precision) within
|
||||||
|
group (order by ...)` because this is an aggregiate that has to work with an
|
||||||
|
ordered-set of data.
|
||||||
|
|
||||||
|
[source](https://www.postgresql.org/docs/current/functions-aggregate.html)
|
||||||
Reference in New Issue
Block a user