2

In the elastic search documentation, under Term Vectors API

Field statistics
Setting field_statistics to false (default is true) will omit :

document count (how many documents contain this field)
sum of document frequencies (the sum of document frequencies for all terms in this field)
sum of total term frequencies (the sum of total term frequencies of each term in this field)

I don't get this part.

I've been experimenting but no matter how hard I check, I don't get what these fields represent.

To my understanding, document count is how many documents contain the field (eg. fields=name), and sum of total term frequencies is the total term count for all the terms in this field, but I don't accurately get the fields.

Checking my main index, I have a certain field (title.keyword). I assume when I set /index_sample/_termvectors/1?fields=title.keyword, I should get the total number of documents (doc_count = 45,000) that have this field but it returns me a much lower count than what I except (doc_count = 17,000).

Could anyone be able to explain this to me with some simple examples? There is almost no third-party documentation on this and it's driving me crazy.

Thank you!

John Lee
  • 43
  • 5

1 Answers1

0

AFAIK, Term Vectors API fetches information on supplied document's term vectors. To get index level information ( which is very expensive) check out my plugin here - https://github.com/nirmalc/es-termstat or jprante's plugin: https://github.com/jprante/elasticsearch-index-termlist

Nirmal
  • 1,276
  • 8
  • 16
  • I don't get the differences between what each of the field statistics values are. Is there a different documentation available where I can better read up on this? And I will look into your's and jprante's plugin. I'm sure it will be very useful in analyzing my index. – John Lee Jul 06 '20 at 04:47