I know that lucene creates an index and stores all the data .Can any one tell me how the data is stored in flat file? or what kind of algorithms they use to store the data in backend so that they can retrieve it quickly?
Asked
Active
Viewed 1.6k times
3 Answers
8
Don't know if this is what you asked for. But the more general answer is that they use/implement a Inverted Index. The specifics of how Lucene stores it you can find in file formats (as milan said).
But the general idea is that they store a Inverted Index data structure and other auxiliar data structures to help answer queries quickly. For example, it stores a vector of norms for each document and each term's IDF (inverse document frequency). Lucene also stores the actual document fields, but that is outside the Inverted Index.

Felipe Hummel
- 4,674
- 5
- 32
- 35
4
You can read this book http://nlp.stanford.edu/IR-book/ to know about the data structures, algorithms and models used in information retrieval systems

naresh
- 2,113
- 20
- 32
-
1It is a good entry level book, but it is a bit not relevant to this problem, still a good reference. – linjunhalida Oct 21 '13 at 12:53
-
2There's also another great book of information retrieval which offers free content now: https://ciir.cs.umass.edu/irbook/ – realjin Dec 26 '16 at 01:35