mirror of
https://github.com/jbranchaud/til
synced 2026-01-03 07:08:01 +00:00
Add Count The Number Of Words On A Webpage as a Unix TIL
This commit is contained in:
@@ -10,7 +10,7 @@ pairing with smart people at Hashrocket.
|
||||
|
||||
For a steady stream of TILs, [sign up for my newsletter](https://crafty-builder-6996.ck.page/e169c61186).
|
||||
|
||||
_1584 TILs and counting..._
|
||||
_1585 TILs and counting..._
|
||||
|
||||
See some of the other learning resources I work on:
|
||||
- [Ruby Operator Lookup](https://www.visualmode.dev/ruby-operators)
|
||||
@@ -1498,6 +1498,7 @@ See some of the other learning resources I work on:
|
||||
- [Count The Lines In A CSV Where A Column Is Empty](unix/count-the-lines-in-a-csv-where-a-column-is-empty.md)
|
||||
- [Count The Number Of Matches In A Grep](unix/count-the-number-of-matches-in-a-grep.md)
|
||||
- [Count The Number Of ripgrep Pattern Matches](unix/count-the-number-of-ripgrep-pattern-matches.md)
|
||||
- [Count The Number Of Words On A Webpage](unix/count-the-number-of-words-on-a-webpage.md)
|
||||
- [Create A File Descriptor with Process Substitution](unix/create-a-file-descriptor-with-process-substitution.md)
|
||||
- [Create A Sequence Of Values With A Step](unix/create-a-sequence-of-values-with-a-step.md)
|
||||
- [Curl With Cookies](unix/curl-with-cookies.md)
|
||||
|
||||
25
unix/count-the-number-of-words-on-a-webpage.md
Normal file
25
unix/count-the-number-of-words-on-a-webpage.md
Normal file
@@ -0,0 +1,25 @@
|
||||
# Count The Number Of Words On A Webpage
|
||||
|
||||
I was reading through a couple sections of the `postfix` documentation and I
|
||||
was astounded at how large the webpage is, and that is just for the `main.cf`
|
||||
file format.
|
||||
|
||||
Curiosity got the best of me and I wanted to get a sense of the magnitude of
|
||||
the page. A word count seemed like a good measure.
|
||||
|
||||
Using `pandoc` and a couple other unix utilities, I was able to quickly get
|
||||
that number.
|
||||
|
||||
```bash
|
||||
curl -s http://www.postfix.org/postconf.5.html\#virtual_mailbox_maps | pandoc -f html -t plain | wc -w
|
||||
88383
|
||||
```
|
||||
|
||||
Generically, that is:
|
||||
|
||||
```bash
|
||||
curl -s url | pandoc -f html -t plain | wc -w
|
||||
```
|
||||
|
||||
Pandoc produces a plain-text version of the HTML page that was pulled in by
|
||||
`curl` and then we use `wc` to get a word (`-w`) count.
|
||||
Reference in New Issue
Block a user