I'm working on inverted indexing and my question is: in the final step we should return the total number of documents the word appeared in or just each document number ? for example : if the word "Hello" appeared in 3 documents(document A and document B and document C) I should return 3 or A,B,C ?
Asked
Active
Viewed 224 times
2 Answers
2
An Index implies it will give you a lookup to something, not just a number. A frequency count would give you a count of the number of occurrences of a word.
BTW You can get the number from the A,B,C but not the other way around.

Peter Lawrey
- 525,659
- 79
- 751
- 1,130
0
That's totally up to you !
If you just need to return the total number of documents a certain word appears in, then you won't even need an inverted index. All you would need is a mapping from words to counts. That would take much less computation and space than an inverted index.
If you're working on an exercise in Information Retrieval (or doing some proof of concept, etc), it seems to me that you would also need to return the docs where a given words was found, that's Boolean Retrieval

Mouhcine
- 276
- 1
- 2
- 12