1
0
mirror of https://github.com/jbranchaud/til synced 2026-01-03 15:18:01 +00:00
Files
til/unix/count-the-number-of-words-on-a-webpage.md

26 lines
769 B
Markdown

# Count The Number Of Words On A Webpage
I was reading through a couple sections of the `postfix` documentation and I
was astounded at how large the webpage is, and that is just for the `main.cf`
file format.
Curiosity got the best of me and I wanted to get a sense of the magnitude of
the page. A word count seemed like a good measure.
Using `pandoc` and a couple other unix utilities, I was able to quickly get
that number.
```bash
curl -s http://www.postfix.org/postconf.5.html\#virtual_mailbox_maps | pandoc -f html -t plain | wc -w
88383
```
Generically, that is:
```bash
curl -s url | pandoc -f html -t plain | wc -w
```
Pandoc produces a plain-text version of the HTML page that was pulled in by
`curl` and then we use `wc` to get a word (`-w`) count.