How can i optimize my schema to remove useless default index on "_id"

Question

This is my document structure :

db.like
{
  _id: objectid, /* not null */ 
  id_from: int64, /* not null */
  id_to: int64, /* not null */
  date: datetime, /* default NOW not null */
}

db.like.createIndex( {"id_to": 1, "id_from": 1}, {unique: true} );  
db.like.createIndex( {"id_to": 1, "date": -1}, {unique: false} );

I load document only in one of these ways:

db.like.find({$and: [{id_to:xxx}, {id_from:yyyy}]})

or

db.like.find({id_to:xxx}).sort({date:-1});

and later i shard the collection like this :

sh.shardCollection( "like", {"id_to": 1, "id_from": 1}, unique: true );

As you see i don't use at all the index on "_id". I m a little worry to have an index on "_id" that seam to be useless. Is their a way to optimize my schema or better to leave it like this ?

NOTE: the solution must work with sharding, so the solution given by clcto seam to be bad for this! it's solution was to declare _id as a document like :

{
   _id : { 
      to : int64,
      from : int64
   },
   date : datetime
}

but i m quite sure that with such declaration query like

db.like.find({id_to:xxx}).sort({date:-1});

will be done on all shards

Mongo is designed so that each document needs to have a unique id, which is ensured by the unique index on `_id`. AFAIK, you cannot remove it, but you can set up your document so that `_id` is a document containing `id_to` and `id_from`, since from the code provided is guaranteed to be unique. — clcto, Apr 16 '18 at 20:57
@clcto : thank but how you will setup _id to be a document containing id_to and id_from and at the same time keep my 2 queries performants ? — vostock, Apr 16 '18 at 21:01

clcto · Answer 1 · 2018-04-16T21:18:22.007

1

Mongo is designed so that each document needs to have a unique id, which is ensured by the unique index on _id. AFAIK, you cannot remove it, but if you are able to change the schema, you can set up your document so that _id is a document containing id_to and id_from, since from the code provided is guaranteed to be unique:

{
   _id : { 
      to : int64,
      from : int64
   },
   date : datetime
}

For the indexes, since the id index is already created, so you don't need that one. You can index into the _id document for the second:

db.like.createIndex( {"_id.to": 1, "date": -1}, {unique: false} );

Then your queries would be:

db.like.find({ _id : { to: xxx, from: yyyy } });
db.like.find({ _id.to: xxx }).sort({date:-1});

Note: MongoDB requires the _id to be immutable, so if you need to be able to update the original fields id_to and id_from, you cannot use this method.

edited Apr 16 '18 at 21:18

answered Apr 16 '18 at 21:06

clcto

9,530
20
42

hmm interesting, and what about the sharding ? i will shard on sh.shardCollection( "like", {"_id.to": 1, "_id.from": 1}, unique: true ); ? or simply on sh.shardCollection( "like", {"_id": 1}, unique: true ); – vostock Apr 16 '18 at 21:09
I'm not familiar with mongo sharding so I cannot answer that. – clcto Apr 16 '18 at 21:10
is their any doc somewhere about using document as _id ? is their any drawback of such method ? – vostock Apr 16 '18 at 21:11
The only drawback I know about is that the `_id` field is immutable. So if these fields need to be updated later, this method cannot be used. – clcto Apr 16 '18 at 21:17
From https://docs.mongodb.com/manual/core/document/index.html: *"Documents have the following restrictions on field names: The field name _id is reserved for use as a primary key; its value must be unique in the collection, is immutable, and may be of any type other than an array."* – clcto Apr 16 '18 at 21:24
thanks! but now that i look i m quite sure (if not completely sure) that the sharding will not work :( to shard i need an index so if i shard on "_id" then i m sure that doing db.like.find({ _id.to: xxx }).sort({date:-1}); will be run against ALL the shards :( – vostock Apr 16 '18 at 21:38

How can i optimize my schema to remove useless default index on "_id"

1 Answers1