diff --git a/README.md b/README.md index eddf31c..7e749e9 100644 --- a/README.md +++ b/README.md @@ -10,7 +10,7 @@ pairing with smart people at Hashrocket. For a steady stream of TILs, [sign up for my newsletter](https://crafty-builder-6996.ck.page/e169c61186). -_1302 TILs and counting..._ +_1303 TILs and counting..._ --- @@ -1253,6 +1253,7 @@ _1302 TILs and counting..._ - [File Type Info With File](unix/file-type-info-with-file.md) - [Find All Files Matching A Name With fd](unix/find-all-files-matching-a-name-with-fd.md) - [Find A File Installed By Brew](unix/find-a-file-installed-by-brew.md) +- [Find Duplicate Lines In A File](unix/find-duplicate-lines-in-a-file.md) - [Find Files With fd](unix/find-files-with-fd.md) - [Find Newer Files](unix/find-newer-files.md) - [Fix Unlinked Node Binaries With asdf](unix/fix-unlinked-node-binaries-with-asdf.md) diff --git a/unix/find-duplicate-lines-in-a-file.md b/unix/find-duplicate-lines-in-a-file.md new file mode 100644 index 0000000..633de5c --- /dev/null +++ b/unix/find-duplicate-lines-in-a-file.md @@ -0,0 +1,20 @@ +# Find Duplicate Lines In A File + +Let's say I have a large file in a Ruby project. I want to find instances of a +`field` declaration being duplicated throughout the file. Just searching for +duplicate lines within the file is going to result in all kinds of false +positives (think, lots of duplicate `end` lines). + +What I can do is `grep` for a pattern that will just match on the lines that +are `field` declarations. The results of the `grep` can then be piped to `sort` +which will order them. This ordering will mean that any duplicates are placed +next to each other. Lastly, I'll pipe the sorted lines to `uniq` with the `-d` +flag which will filter the results down to just those lines that are repeated. + +Here is what the whole thing looks like: + +``` +$ grep -o "field :[a-zA-Z_][a-zA-Z_0-9]*" file.rb | sort | uniq -d +``` + +See `man uniq` for more details on the available flags.