diff --git a/README.md b/README.md index 051c3ac..6b743e8 100644 --- a/README.md +++ b/README.md @@ -10,7 +10,7 @@ pairing with smart people at Hashrocket. For a steady stream of TILs, [sign up for my newsletter](https://crafty-builder-6996.ck.page/e169c61186). -_1584 TILs and counting..._ +_1585 TILs and counting..._ See some of the other learning resources I work on: - [Ruby Operator Lookup](https://www.visualmode.dev/ruby-operators) @@ -1498,6 +1498,7 @@ See some of the other learning resources I work on: - [Count The Lines In A CSV Where A Column Is Empty](unix/count-the-lines-in-a-csv-where-a-column-is-empty.md) - [Count The Number Of Matches In A Grep](unix/count-the-number-of-matches-in-a-grep.md) - [Count The Number Of ripgrep Pattern Matches](unix/count-the-number-of-ripgrep-pattern-matches.md) +- [Count The Number Of Words On A Webpage](unix/count-the-number-of-words-on-a-webpage.md) - [Create A File Descriptor with Process Substitution](unix/create-a-file-descriptor-with-process-substitution.md) - [Create A Sequence Of Values With A Step](unix/create-a-sequence-of-values-with-a-step.md) - [Curl With Cookies](unix/curl-with-cookies.md) diff --git a/unix/count-the-number-of-words-on-a-webpage.md b/unix/count-the-number-of-words-on-a-webpage.md new file mode 100644 index 0000000..b36f2b3 --- /dev/null +++ b/unix/count-the-number-of-words-on-a-webpage.md @@ -0,0 +1,25 @@ +# Count The Number Of Words On A Webpage + +I was reading through a couple sections of the `postfix` documentation and I +was astounded at how large the webpage is, and that is just for the `main.cf` +file format. + +Curiosity got the best of me and I wanted to get a sense of the magnitude of +the page. A word count seemed like a good measure. + +Using `pandoc` and a couple other unix utilities, I was able to quickly get +that number. + +```bash +curl -s http://www.postfix.org/postconf.5.html\#virtual_mailbox_maps | pandoc -f html -t plain | wc -w + 88383 +``` + +Generically, that is: + +```bash +curl -s url | pandoc -f html -t plain | wc -w +``` + +Pandoc produces a plain-text version of the HTML page that was pulled in by +`curl` and then we use `wc` to get a word (`-w`) count.