diff --git a/README.md b/README.md index 9f62f08..62ffa39 100644 --- a/README.md +++ b/README.md @@ -10,7 +10,7 @@ pairing with smart people at Hashrocket. For a steady stream of TILs, [sign up for my newsletter](https://tinyletter.com/jbranchaud). -_1121 TILs and counting..._ +_1122 TILs and counting..._ --- @@ -557,6 +557,7 @@ _1121 TILs and counting..._ - [Escaping String Literals With Dollar Quoting](postgres/escaping-string-literals-with-dollar-quoting.md) - [Export Query Results To A CSV](postgres/export-query-results-to-a-csv.md) - [Extracting Nested JSON Data](postgres/extracting-nested-json-data.md) +- [Find Records That Contain Duplicate Values](postgres/find-records-that-contain-duplicate-values.md) - [Find Records That Have Multiple Associated Records](postgres/find-records-that-have-multiple-associated-records.md) - [Find The Data Directory](postgres/find-the-data-directory.md) - [Find The Location Of Postgres Config Files](postgres/find-the-location-of-postgres-config-files.md) diff --git a/postgres/find-records-that-contain-duplicate-values.md b/postgres/find-records-that-contain-duplicate-values.md new file mode 100644 index 0000000..f3eb5a0 --- /dev/null +++ b/postgres/find-records-that-contain-duplicate-values.md @@ -0,0 +1,47 @@ +# Find Records That Contain Duplicate Values + +Let's say I have a `mailing_list` table that contains all the email addresses +that I want to send a mailing out to. Without a uniqueness constraint on the +`email` column, I can end up with multiple records containing the same email +address — duplicates. + +Here are a couple queries for checking to see if any duplicate records exist +and which ones they are. + +```sql +select email +from ( + select + email, + row_number() over ( + partition by email + order by email + ) as row_num + from mailing_list +) t +where t.row_num > 1; +``` + +This is cool because it uses a [window +function](https://www.postgresql.org/docs/current/tutorial-window.html), +specifically the +[`row_number()`](https://www.postgresql.org/docs/current/functions-window.html) +window function, to assign an incrementing number to each row in the partition. + +Here is another, conceptually simpler approach. + +```sql +select + email + count(*) +from mailing_list +group by email +having count(*) > 1 +order by email; +``` + +Though we cannot use a `where` clause with an aggregate (`count`), we can reach +for a `having` clause to grab only those results where we've found more than +`1` — duplicates. + +[source](https://www.postgresqltutorial.com/how-to-delete-duplicate-rows-in-postgresql/)