mirror of
https://github.com/jbranchaud/til
synced 2026-01-03 23:28:02 +00:00
Add Find Duplicate Lines In A File as a Unix TIL
This commit is contained in:
@@ -10,7 +10,7 @@ pairing with smart people at Hashrocket.
|
|||||||
|
|
||||||
For a steady stream of TILs, [sign up for my newsletter](https://crafty-builder-6996.ck.page/e169c61186).
|
For a steady stream of TILs, [sign up for my newsletter](https://crafty-builder-6996.ck.page/e169c61186).
|
||||||
|
|
||||||
_1302 TILs and counting..._
|
_1303 TILs and counting..._
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
@@ -1253,6 +1253,7 @@ _1302 TILs and counting..._
|
|||||||
- [File Type Info With File](unix/file-type-info-with-file.md)
|
- [File Type Info With File](unix/file-type-info-with-file.md)
|
||||||
- [Find All Files Matching A Name With fd](unix/find-all-files-matching-a-name-with-fd.md)
|
- [Find All Files Matching A Name With fd](unix/find-all-files-matching-a-name-with-fd.md)
|
||||||
- [Find A File Installed By Brew](unix/find-a-file-installed-by-brew.md)
|
- [Find A File Installed By Brew](unix/find-a-file-installed-by-brew.md)
|
||||||
|
- [Find Duplicate Lines In A File](unix/find-duplicate-lines-in-a-file.md)
|
||||||
- [Find Files With fd](unix/find-files-with-fd.md)
|
- [Find Files With fd](unix/find-files-with-fd.md)
|
||||||
- [Find Newer Files](unix/find-newer-files.md)
|
- [Find Newer Files](unix/find-newer-files.md)
|
||||||
- [Fix Unlinked Node Binaries With asdf](unix/fix-unlinked-node-binaries-with-asdf.md)
|
- [Fix Unlinked Node Binaries With asdf](unix/fix-unlinked-node-binaries-with-asdf.md)
|
||||||
|
|||||||
20
unix/find-duplicate-lines-in-a-file.md
Normal file
20
unix/find-duplicate-lines-in-a-file.md
Normal file
@@ -0,0 +1,20 @@
|
|||||||
|
# Find Duplicate Lines In A File
|
||||||
|
|
||||||
|
Let's say I have a large file in a Ruby project. I want to find instances of a
|
||||||
|
`field` declaration being duplicated throughout the file. Just searching for
|
||||||
|
duplicate lines within the file is going to result in all kinds of false
|
||||||
|
positives (think, lots of duplicate `end` lines).
|
||||||
|
|
||||||
|
What I can do is `grep` for a pattern that will just match on the lines that
|
||||||
|
are `field` declarations. The results of the `grep` can then be piped to `sort`
|
||||||
|
which will order them. This ordering will mean that any duplicates are placed
|
||||||
|
next to each other. Lastly, I'll pipe the sorted lines to `uniq` with the `-d`
|
||||||
|
flag which will filter the results down to just those lines that are repeated.
|
||||||
|
|
||||||
|
Here is what the whole thing looks like:
|
||||||
|
|
||||||
|
```
|
||||||
|
$ grep -o "field :[a-zA-Z_][a-zA-Z_0-9]*" file.rb | sort | uniq -d
|
||||||
|
```
|
||||||
|
|
||||||
|
See `man uniq` for more details on the available flags.
|
||||||
Reference in New Issue
Block a user