77

I'm currently searching for an application or a script that does a correct word count for a LaTeX document.

Up till now, I have only encountered scripts that only work on a single file but what I want is a script that can safely ignore LaTeX keywords and also traverse linked files...ie follow \include and \input links to produce a correct word-count for the whole document.

With vim, I currently use ggVGg CTRL+G but obviously that shows the count for the current file and does not ignore LaTeX keywords.

Does anyone know of any script (or application) that can do this job?

Thom Wiggers
  • 6,938
  • 1
  • 39
  • 65
Andreas Grech
  • 105,982
  • 98
  • 297
  • 360
  • 4
    Trying finding a tool that counts the words in your published PDF -- most LaTeX word-counts fail on understanding what actually gets printed. – icio Jun 04 '10 at 14:25
  • 2
    @icio - Hyphenated words, math formulas, headers and footers, all make it quite difficult to count the words in a PDF. – Geoff Jun 08 '10 at 12:14
  • 1
    @Geoff - I agree, but this is a common downfall between word-counters for PDF and TeX documents so far as I am aware. – icio Jun 09 '10 at 14:21
  • 4
    Those who end up here via a search may want to look at the more recent answer on TeX.se: http://tex.stackexchange.com/questions/534/is-there-any-way-to-do-a-correct-word-count-of-a-latex-document – isomorphismes Jan 10 '14 at 22:40

9 Answers9

77

I use texcount. The webpage has a Perl script to download (and a manual).

It will include tex files that are included (\input or \include) in the document (see -inc), supports macros, and has many other nice features.

When following included files you will get detail about each separate file as well as a total. For example here is the total output for a 12 page document of mine:

TOTAL COUNT
Files: 20
Words in text: 4188
Words in headers: 26
Words in float captions: 404
Number of headers: 12
Number of floats: 7
Number of math inlines: 85
Number of math displayed: 19

If you're only interested in the total, use the -total argument.

Geoff
  • 7,935
  • 3
  • 35
  • 43
  • But does it follow links to `\include` and `\input` files? – Andreas Grech Jun 07 '10 at 18:48
  • 3
    Yes, that's what the `-inc` parameter does (I'll edit my response). – Geoff Jun 07 '10 at 22:11
  • Brilliant. Just tested out this script and it works great! Cheers Geoff – Andreas Grech Jun 08 '10 at 16:54
  • Cool. I haven't played with the macro support. If you have macros which produce text, you will need to look into that section. – Geoff Jun 08 '10 at 17:35
  • what about bibtex and counting your references? – dorien Jun 22 '13 at 23:18
  • @dorien, what do you mean? You want to know how many references you have? – Geoff Jun 24 '13 at 12:36
  • 1
    If that's what you want, I think you can do `grep bibcite paper.aux | wc`, where `paper.aux` should be the proper `aux` file for your document, but you'll need to compile the document to get the `aux` file. – Geoff Jun 24 '13 at 12:41
  • @Geoff, this command gives me a very low number. Too low to be all my references. I already pasted the pages in a text editor to count, but thanks for replying though. Reason I mention it is that Journals often ask a word count excluding references. – dorien Jul 03 '13 at 16:53
  • Okay, gotcha. I was trying to get the number of references (not the number of words, but the unique citations). Is it right for that? – Geoff Jul 03 '13 at 18:53
  • Could texcount work for a LaTeX document in Russian? – Glory to Russia Feb 16 '14 at 11:11
  • 1
    @DmitriPisarenko - See the FAQ: http://app.uio.no/ifi/texcount/faq.html#languages – Geoff Jul 24 '14 at 14:04
  • It doesn't seem to be up-to-date or functional for me (I'm getting a deprecation warning followed by errors related to it). – Maxie Berkmann Dec 02 '19 at 20:32
13

I went with icio's comment and did a word-count on the pdf itself by piping the output of pdftotext to wc:

pdftotext file.pdf - | wc - w 
Andreas Grech
  • 105,982
  • 98
  • 297
  • 360
  • 3
    Be careful with this. I believe a word that is hyphenated across two lines will show up as 2 words, not one. Headers and footers will also be counted. Look at the output from `pdftotext` and see if it is okay for you. If you want an exact count, I would not use this solution. – Geoff Jun 07 '10 at 13:20
  • 1
    This solution is close enough if you just want to get a rough feel for how big documents are. I would agree with Geoff in that it's not suitable for holding yourself to specific publishing related word counts. – Joseph Lisee May 04 '11 at 23:55
  • I like your idea because it include bibliografy items! – dorien Jun 22 '13 at 23:19
7
latex file.tex
dvips -o - file.dvi | ps2ascii | wc -w

should give you a fairly accurate word count.

aioobe
  • 413,195
  • 112
  • 811
  • 826
  • If you use pdflatex, just do `pdftops file.tex` and then `ps2ascii|wc -w` I compared this count to the count in Word and of all the ones in here, it was the one with the closest number. See my comparisons in my response – fiacobelli Feb 15 '14 at 05:32
  • 1
    @fiacobelli it should be `pdftops file.pdf` – prab4th Nov 30 '17 at 07:53
6

To add to @aioobe,

If you use pdflatex, just do

pdftops file.pdf
ps2ascii file.ps|wc -w

I compared this count to the count in Microsoft Word in a 1599 word document (according to Word). pdftotext produced a text with 1700+ words. texcount did not include the references and produced 1088 words. ps2ascii returned 1603 words. 4 more than in Word.

I say that's a pretty good count. I am not sure where's the 4 word difference, though. :)

fiacobelli
  • 1,960
  • 5
  • 24
  • 31
5

In Texmaker interface you can get the word count by right clicking in the PDF preview:

enter image description here

enter image description here

Franck Dernoncourt
  • 77,520
  • 72
  • 342
  • 501
3

Overleaf has a word count feature:

Overleaf v2:

enter image description here

enter image description here

Overleaf v1:

enter image description here

enter image description here

Franck Dernoncourt
  • 77,520
  • 72
  • 342
  • 501
1

I use the following VIM script:

function! WC()
    let filename = expand("%")
    let cmd = "detex " . filename . " | wc -w | perl -pe 'chomp; s/ +//;'"
    let result = system(cmd)
    echo result . " words"
endfunction

… but it doesn’t follow links. This would basically entail parsing the TeX file to get all linked files, wouldn’t it?

The advantage over the other answers is that it doesn’t have to produce an output file (PDF or PS) to compute the word count so it’s potentially (depending on usage) much more efficient.

Although icio’s comment is theoretically correct, I found that the above method gives quite accurate estimates for the number of words. For most texts, it’s well within the 5% margin that is used in many assignments.

Konrad Rudolph
  • 530,221
  • 131
  • 937
  • 1,214
  • Cheers for the script but following links is a must for me since my document is pretty much structured with `\include`s – Andreas Grech Jun 04 '10 at 14:58
1

If the use of a vim plugin suits you, the vimtex plugin has integrated the texcount tool quite nicely.

Here is an excerpt from their documentation:

:VimtexCountLetters       Shows the number of letters/characters or words in
:VimtexCountWords         the current project or in the selected region. The
                          count is created with `texcount` through a call on
                          the main project file similar to: >

                            texcount -nosub -sum [-letter] -merge -q -1 FILE
<
                          Note: Default arguments may be controlled with
                                |g:vimtex_texcount_custom_arg|.

                          Note: One may access the information through the
                                function `vimtex#misc#wordcount(opts)`, where
                                `opts` is a dictionary with the following
                                keys (defaults indicated): >

                                'range' : [1, line('$')]
                                'count_letters' : 0/1
                                'detailed' : 0
<
                                If `detailed` is 0, then it only returns the
                                total count. This makes it possible to use for
                                e.g. statusline functions. If the `opts` dict
                                is not passed, then the defaults are assumed.

                                             *VimtexCountLetters!*
                                             *VimtexCountWords!*
:VimtexCountLetters!      Similar to |VimtexCountLetters|/|VimtexCountWords|, but
:VimtexCountWords!        show separate reports for included files.  I.e.
                          presents the result of: >

                            texcount -nosub -sum [-letter] -inc FILE
<
                                             *VimtexImapsList*
                                             *<plug>(vimtex-imaps-list)*

The nice part about this is how extensible it is. On top of counting the number of words in your current file, you can make a visual selection (say two or three paragraphs) and then only apply the command to your selection.

Benjamin Chausse
  • 1,437
  • 2
  • 10
  • 20
0

For a very basic article class document I just look at the number of matches for a regex to find words. I use Sublime Text, so this method may not work for you in a different editor, but I just hit Ctrl+F (Command+F on Mac) and then, with regex enabled, search for

(^|\s+|"|((h|f|te){)|\()\w+

which should ignore text declaring a floating environment or captions on figures as well as most kinds of basic equations and \usepackage declarations, while including quotations and parentheticals. It also counts footnotes and \emphasized text and will count \hyperref links as one word. It's not perfect, but it's typically accurate to within a few dozen words or so. You could refine it to work for you, but a script is probably a better solution, since LaTeX source code isn't a regular language. Just thought I'd throw this up here.

ocket8888
  • 1,060
  • 12
  • 31