1
0
mirror of https://github.com/jbranchaud/til synced 2026-01-16 21:48:02 +00:00

Compare commits

..

1 Commits

Author SHA1 Message Date
nick-w-nick
e2f0cfe471 Merge 295fe153ad into 96c394c198 2025-02-03 16:51:40 -05:00
2 changed files with 1 additions and 27 deletions

View File

@@ -10,7 +10,7 @@ pairing with smart people at Hashrocket.
For a steady stream of TILs, [sign up for my newsletter](https://crafty-builder-6996.ck.page/e169c61186).
_1585 TILs and counting..._
_1584 TILs and counting..._
See some of the other learning resources I work on:
- [Ruby Operator Lookup](https://www.visualmode.dev/ruby-operators)
@@ -1498,7 +1498,6 @@ See some of the other learning resources I work on:
- [Count The Lines In A CSV Where A Column Is Empty](unix/count-the-lines-in-a-csv-where-a-column-is-empty.md)
- [Count The Number Of Matches In A Grep](unix/count-the-number-of-matches-in-a-grep.md)
- [Count The Number Of ripgrep Pattern Matches](unix/count-the-number-of-ripgrep-pattern-matches.md)
- [Count The Number Of Words On A Webpage](unix/count-the-number-of-words-on-a-webpage.md)
- [Create A File Descriptor with Process Substitution](unix/create-a-file-descriptor-with-process-substitution.md)
- [Create A Sequence Of Values With A Step](unix/create-a-sequence-of-values-with-a-step.md)
- [Curl With Cookies](unix/curl-with-cookies.md)

View File

@@ -1,25 +0,0 @@
# Count The Number Of Words On A Webpage
I was reading through a couple sections of the `postfix` documentation and I
was astounded at how large the webpage is, and that is just for the `main.cf`
file format.
Curiosity got the best of me and I wanted to get a sense of the magnitude of
the page. A word count seemed like a good measure.
Using `pandoc` and a couple other unix utilities, I was able to quickly get
that number.
```bash
curl -s http://www.postfix.org/postconf.5.html\#virtual_mailbox_maps | pandoc -f html -t plain | wc -w
88383
```
Generically, that is:
```bash
curl -s url | pandoc -f html -t plain | wc -w
```
Pandoc produces a plain-text version of the HTML page that was pulled in by
`curl` and then we use `wc` to get a word (`-w`) count.