Add Find Records That Contain Duplicate Values as a postgres til

2026-07-05 17:00:17 +00:00 · 2021-05-12 11:38:15 -05:00
parent 753636c83f
commit 9e89734285
2 changed files with 49 additions and 1 deletions
@@ -10,7 +10,7 @@ pairing with smart people at Hashrocket.

 For a steady stream of TILs, [sign up for my newsletter](https://tinyletter.com/jbranchaud).

-_1121 TILs and counting..._
+_1122 TILs and counting..._

 ---

@@ -557,6 +557,7 @@ _1121 TILs and counting..._
 - [Escaping String Literals With Dollar Quoting](postgres/escaping-string-literals-with-dollar-quoting.md)
 - [Export Query Results To A CSV](postgres/export-query-results-to-a-csv.md)
 - [Extracting Nested JSON Data](postgres/extracting-nested-json-data.md)
+- [Find Records That Contain Duplicate Values](postgres/find-records-that-contain-duplicate-values.md)
 - [Find Records That Have Multiple Associated Records](postgres/find-records-that-have-multiple-associated-records.md)
 - [Find The Data Directory](postgres/find-the-data-directory.md)
 - [Find The Location Of Postgres Config Files](postgres/find-the-location-of-postgres-config-files.md)
@@ -0,0 +1,47 @@
+# Find Records That Contain Duplicate Values
+
+Let's say I have a `mailing_list` table that contains all the email addresses
+that I want to send a mailing out to. Without a uniqueness constraint on the
+`email` column, I can end up with multiple records containing the same email
+address — duplicates.
+
+Here are a couple queries for checking to see if any duplicate records exist
+and which ones they are.
+
+```sql
+select email
+from (
+  select
+    email,
+    row_number() over (
+      partition by email
+      order by email
+    ) as row_num
+  from mailing_list
+) t
+where t.row_num > 1;
+```
+
+This is cool because it uses a [window
+function](https://www.postgresql.org/docs/current/tutorial-window.html),
+specifically the
+[`row_number()`](https://www.postgresql.org/docs/current/functions-window.html)
+window function, to assign an incrementing number to each row in the partition.
+
+Here is another, conceptually simpler approach.
+
+```sql
+select
+  email
+  count(*)
+from mailing_list
+group by email
+having count(*) > 1
+order by email;
+```
+
+Though we cannot use a `where` clause with an aggregate (`count`), we can reach
+for a `having` clause to grab only those results where we've found more than
+`1` — duplicates.
+
+[source](https://www.postgresqltutorial.com/how-to-delete-duplicate-rows-in-postgresql/)