0

Here is a simple example of the kind of query my app makes

So you have a notepad and notes collection. The note documents are linked to the notepad document using the notepadId.

So a typically query looks like:

Notes.find({notepadId: notepad._id})

A user can have hundreds of notes per notepad. And they switch between notepads often. Also users collaborate on notepads.

I also have a mechanism that counts all the notes in the notepad on the server on page load. This is because I need to know how many pages to show since I use pagination.

Notes.find({notepadId: notepad._id}).count()

I am not very familiar with MongoDB indexing and have tried to search for 'Indexing linked documents' and can't find any info on indexing linked documents.

Is this a common practice to index linked document fields?

So in this case I would want to set an index on the Notes collection, on the field notepadId

Good idea, bad idea, why?

Nearpoint
  • 7,202
  • 13
  • 46
  • 74

2 Answers2

1

The documents become "linked" only on application layer. There is nothing special in notepadId field in mongodb itself. Indexing by this field will make counting extremely efficient, as it is a covered query, which requires no disk IO.

Alex Blex
  • 34,704
  • 7
  • 48
  • 75
  • Thanks. Quick follow up. If I have a common query like so `Notes.find({notepadId: notepad._id, archived: false}).count()` Would I also want to index the `archived ` field? – Nearpoint Nov 20 '16 at 00:48
  • If the query contains one indexed field, and one non-indexed field, does indexing help at all in this scenario? – Nearpoint Nov 20 '16 at 01:20
  • Yes, the index still works, but it wont be as efficient, as all documents that match `notepadId` will be scanned to check `archived` field. Please read the page behind the link in the answer, I put it there for purpose. To benefit from covered query you need to ensure the query can be resolved by indexed data only. – Alex Blex Nov 20 '16 at 02:01
  • Index {notepadId: 1, archived: 1} would be perfect index for you, because it answers to both of queries. One with only with notepadId and that other where you have that archived too. However, it will NOT answer to query where only archived is queried. – JJussi Nov 20 '16 at 20:27
0

Those answers where (index covered query) was talked about, forgot to tell that, to get fully index covered result, you must do projection what excludes _id -field.

Let me explain.

In full index covered query, we read ONLY index and not anything else from disk. If we have f.ex. Notes.createIndex({notepadId: 1, archived: 1}) (what covers both of those two queries), and we do Notes.find({notepadId: notepad._id}).count() there will be two different actions. First there is "find" what collects result set and then "count" what counts how many documents was in that result set.

Find ALWAYS includes (in it's result set) "_id" field, if it is NOT excluded in projection. Because our index don't have _id -field, only pointer to documents disk location, find first finds required document using index and then jumps to disk location where that index entry points, to read document's _id. Not what we wanted. To prevent this, we need projection. So query:

Notes.find({notepadId: notepad._id},{"_id":0, "notepadId":1}).count()

will return (in find section) only list of "notepadId"'s and not anything else.

The other query would be: Notes.find({notepadId: notepad._id, archived: false},{"_id":0, "notepadId":1, archived:1}).count()

You can test this easily doing two different queries:

Notes.find({notepadId: notepad._id}).explain()
Notes.find({notepadId: notepad._id},{"_id":0, "notepadId":1}).explain()

and comparing results.

JJussi
  • 1,540
  • 12
  • 12