I'm analyzing texts. Those texts have annotations (e.g. "chapter", "scenery", ...). Those annotations are in my MongoDB collection annotations
, e.g.
{
start: 1,
stop: 10000,
type: chapter,
details: {
number: 1,
title: "Where it all began"
}
},
{
start: 10001,
stop: 20000,
type: chapter,
details: {
number: 2,
title: "Lovers"
}
},
{
start: 1,
stop: 5000,
type: scenery,
details: {
descr: "castle"
}
},
{
start: 5001,
stop: 15000,
type: scenery,
details: {
descr: "forest"
}
}
Challenge 1: For a given position in the text, I'd like find all annotations. For example querying for character 1234
should tell me, that
- it is within chapter one
- it takes place in the castle
Challenge 2: I also like to query for ranges. For example querying for characters form 9800 to 10101
should tell me, that it touches chapter 1
, chapter 2
and the scenery forest
.
Challenge 3: Comparable to challenge 2 I'd like to match only those annotations that are completely covered by the query-range. For example querying for characters form 9800 to 30000
should only return the document chapter 2
.
For challenge 1 I tried to simply use $lt
and $gt
. e.g.:
db.annotations.find({start: {$lt: 1234}, stop: {$gt: 1234}});
But I realized, that only indexes for the key start
is used, even if I have a compound index for start
and stop
. Is there a way to create more adequate indexes for the three problems I mentioned?
I shortly thought of geospatial indexes, but I haven't used them, yet. I also only need a one-dimensional version of it.