I am using the Solr 4 trunk build, a couple days old.
According to the Wiki page for the LukeRequestHandler (first example output), we're supposed to get a count of the tokens for each or any specified field. I want to use this to make a count of the number of times each word in all my documents appears. For example, if the word 'is' appears in two MS Word documents, twice in the first and three times in the second, I would get an output like this:
<lst name="text">
<str name="type">text</str>
<str name="schema">IT-M---------</str>
<str name="index">(unstored field)</str>
<int name="docs">2</int>
<int name="distinct">42</int>
<lst name="topTerms">
<int name="is">5</int>
That's because the word "is" occurs a total of five times across the two documents. However what I actually get is <int name="is">2</int>
. I presume this is because it occurs distinctly (by document) a total of two times.
But again, according to the Wiki, we're supposed to get a total count, summed across all the documents, which is what I actually want.
How can I get a total number of times each and every word in all indexed documents appears?
Reference: