1

I am currently attempting to store a document in a database to be able to quickly pull up what words are in a certain location.

Example query: /doc1?start=2,end=5

This would retrieve the second to fifth word in that document. I am open to using any type of database. I would just like to not have to load and parse the whole document for these words every query.

Currently I am looking at loading words up into something like elasticsearch or redis with a format {word:"Apple",index:1} with a hierarchy to denote document. Is this a useful approach to my problem or should I be looking elsewhere?

technoSpino
  • 510
  • 4
  • 12

1 Answers1

0

What is the benefit?

If you are already at a document level, it is cheap to read the whole document and extract the words as desired.

The tricky query is "find all documents where word x occurs close to word y" (and text search engines such as Xapian and Lucene can do this just fine.)

When you want to get the contents of a single document, the best index is by "document id", unless you have very very long document (at which point you may want to break them into chunks of, say, 100 words).

How about representing your document as this:

["this", "is", "an", "example", "document"]
Has QUIT--Anony-Mousse
  • 76,138
  • 12
  • 138
  • 194