mirror of
https://github.com/jbranchaud/til
synced 2026-01-03 07:08:01 +00:00
Add Find Records That Contain Duplicate Values as a postgres til
This commit is contained in:
@@ -10,7 +10,7 @@ pairing with smart people at Hashrocket.
|
||||
|
||||
For a steady stream of TILs, [sign up for my newsletter](https://tinyletter.com/jbranchaud).
|
||||
|
||||
_1121 TILs and counting..._
|
||||
_1122 TILs and counting..._
|
||||
|
||||
---
|
||||
|
||||
@@ -557,6 +557,7 @@ _1121 TILs and counting..._
|
||||
- [Escaping String Literals With Dollar Quoting](postgres/escaping-string-literals-with-dollar-quoting.md)
|
||||
- [Export Query Results To A CSV](postgres/export-query-results-to-a-csv.md)
|
||||
- [Extracting Nested JSON Data](postgres/extracting-nested-json-data.md)
|
||||
- [Find Records That Contain Duplicate Values](postgres/find-records-that-contain-duplicate-values.md)
|
||||
- [Find Records That Have Multiple Associated Records](postgres/find-records-that-have-multiple-associated-records.md)
|
||||
- [Find The Data Directory](postgres/find-the-data-directory.md)
|
||||
- [Find The Location Of Postgres Config Files](postgres/find-the-location-of-postgres-config-files.md)
|
||||
|
||||
47
postgres/find-records-that-contain-duplicate-values.md
Normal file
47
postgres/find-records-that-contain-duplicate-values.md
Normal file
@@ -0,0 +1,47 @@
|
||||
# Find Records That Contain Duplicate Values
|
||||
|
||||
Let's say I have a `mailing_list` table that contains all the email addresses
|
||||
that I want to send a mailing out to. Without a uniqueness constraint on the
|
||||
`email` column, I can end up with multiple records containing the same email
|
||||
address — duplicates.
|
||||
|
||||
Here are a couple queries for checking to see if any duplicate records exist
|
||||
and which ones they are.
|
||||
|
||||
```sql
|
||||
select email
|
||||
from (
|
||||
select
|
||||
email,
|
||||
row_number() over (
|
||||
partition by email
|
||||
order by email
|
||||
) as row_num
|
||||
from mailing_list
|
||||
) t
|
||||
where t.row_num > 1;
|
||||
```
|
||||
|
||||
This is cool because it uses a [window
|
||||
function](https://www.postgresql.org/docs/current/tutorial-window.html),
|
||||
specifically the
|
||||
[`row_number()`](https://www.postgresql.org/docs/current/functions-window.html)
|
||||
window function, to assign an incrementing number to each row in the partition.
|
||||
|
||||
Here is another, conceptually simpler approach.
|
||||
|
||||
```sql
|
||||
select
|
||||
email
|
||||
count(*)
|
||||
from mailing_list
|
||||
group by email
|
||||
having count(*) > 1
|
||||
order by email;
|
||||
```
|
||||
|
||||
Though we cannot use a `where` clause with an aggregate (`count`), we can reach
|
||||
for a `having` clause to grab only those results where we've found more than
|
||||
`1` — duplicates.
|
||||
|
||||
[source](https://www.postgresqltutorial.com/how-to-delete-duplicate-rows-in-postgresql/)
|
||||
Reference in New Issue
Block a user