17

I would like to find out how to do this with MongoDB

I have documents with names as "file1", "file2", "file22", "file11" (name can be anything, there is no particular pattern) I ran the query to get all documents sorted by name and the result is not as expected.

> db.mydata.find().sort({"name":1});                                                                                                                          
{ "_id" : ObjectId("571e5a787e88d30b20b7857c"), "name" : "file1" }                                                                                            
{ "_id" : ObjectId("571e5a8c7e88d30b20b7857d"), "name" : "file11" }                                                                                           
{ "_id" : ObjectId("571e5a977e88d30b20b7857f"), "name" : "file2" }                                                                                            
{ "_id" : ObjectId("571e5a937e88d30b20b7857e"), "name" : "file22" } 

What is expected is (alphabetic / natural order)

{ "_id" : ObjectId("571e5a787e88d30b20b7857c"), "name" : "file1" }                                                                                            
{ "_id" : ObjectId("571e5a977e88d30b20b7857f"), "name" : "file2" }                                                                                           
{ "_id" : ObjectId("571e5a8c7e88d30b20b7857d"), "name" : "file11" }
{ "_id" : ObjectId("571e5a937e88d30b20b7857e"), "name" : "file22" }

As per my finding, there are other ways to sort like using aggregate + $project and $meta: "textScore", but I haven't succeeded so far.

UPDATE: An application of this problem: sort the folders / files by names Windows Explorer, Folders sorted by Name

6220119
  • 809
  • 10
  • 21
  • does all of them have `file` common in them?? – kryshna Apr 25 '16 at 20:27
  • No, @kryshna, this is a simplified set of data. – 6220119 Apr 25 '16 at 22:07
  • 4
    I'm not sure how `file1 < file2 < file11 < ...` is considered the alphabetic order? In no dictionary will you find `A < B < AA`. – Soubhik Mondal Apr 29 '16 at 17:33
  • 1
    I wouldn't say what you want to do is natural order. Natural order for mongodb refers to the natural in which docs are inserted into the db: https://docs.mongodb.org/v3.0/reference/method/cursor.sort/#return-natural-order – Simon Apr 29 '16 at 17:35
  • I know that. That is why I emphasized that I want alphabetical order, not chronological update order. Its application can be seen daily like sorting folder names. Nobody would want to see folders ordered like "1, 11, 2, 22, 3". – 6220119 Apr 29 '16 at 18:24
  • 1
    @BlazeSahlzen, I really don't want to discuss on the correct "term", there is already an article for this (http://blog.codinghorror.com/sorting-for-humans-natural-sort-order/). I changed the title of the question and added the "natural-sort" tag so we should focus on the solution for the problem. – 6220119 Apr 29 '16 at 21:25
  • Found [this](http://stackoverflow.com/questions/14720484/mongoose-mongodb-custom-sort). – Soubhik Mondal Apr 30 '16 at 06:36
  • "db.eval" is deprecated since 3.0, so we don't want to use that. https://docs.mongodb.org/manual/reference/method/db.eval/ – 6220119 Apr 30 '16 at 18:20
  • 1
    If all agree that MongoDB does not offer this, and it has to be done in JavaScript, then [this answer](http://stackoverflow.com/a/15479354/5459839) is the way to go. – trincot May 01 '16 at 10:09
  • @trincot, sorting on the client side is not ok unless you get all data items from the server side. If the number of data items is big, it will cause performance problems. We need data sorted and paginated first, then only the required part of data is returned. – 6220119 May 02 '16 at 05:20
  • @6220119, well I suppose you know whether the number of items is big or not? If client side is out of the question, then you must design a proper database. Allowing to store any unstructured data and then hope to structure it is inefficient, whether you do it on the server or the client. You should *store* it in a structured way, upon *insertion* of the data. – trincot May 02 '16 at 06:44
  • @trincot, suppose you have to design your database for an online file browser, to list all the folders / files, E.g. Dropbox file browser, Windows Explorer, what structure you will provide for the file name itself? As your suggestion, does it mean we need to parse the file name ahead of time and provide some structure to it, even when file name can be anything? – 6220119 May 02 '16 at 17:02
  • But file names are typically sorted alphabetically, not with a "natural" sort order, nor should they. If people want to put structured information in their file names, then they have to take care of the format, such as padding numbers with zeroes, etc. A natural sort might even be counter-intuitive for some people. – trincot May 02 '16 at 18:14
  • @trincot, please see the updated screenshot in the question, we are trying to accomplish something like that, Microsoft had long applied the natural sort into file explorer and it affected user experience in a good way. I know a different group of people will have a different idea about this. But unfortunately, we are dealing with this kind of people, that natural sorting order is a natural expectation. – 6220119 May 02 '16 at 18:32

1 Answers1

8

MongoDB doesn't provide a way to do this out of the box, but you still have two options:

The first is a client-side processing using the Array.prototype.sort method to sort the array result.

db.mydata.find().toArray().sort((a, b) => { 
    var x = Number(a.name.match(/\d+/g)[0]); 
    var y = Number(b.name.match(/\d+/g)[0]);
    return x === y ? 0 :( x < y ? -1 : 1 );
})

The second which is what I suggest you to do is normalize your documents with an extra field that hold the the digits in the "name" as integer and sort your documents using that value. This means that, you will need to update your documents in order to add that field, and the best way to do this is using the $set update operator and "bulk operations" for maximum efficiency. That being said, from MongoDB server version 3.2 you need to use the collection.bulkWrite method to accomplish this.

var requests = [];

db.mydata.find({}, { "name": 1 } ).forEach(doc => { 
    var fileId = Number(doc.name.match(/\d+/g)[0]); // return number from "name" value
    requests.push({
        "updateOne": { 
            "filter": { "_id": doc._id }, 
            "update": { "$set": { "fileId": fileId } } 
        } 
    }); 
    // Execute per 1000 operations and re-init the requests queue
    if( requests.length === 1000 ) 
        db.mydata.bulkWrite(requests); 
})

// Clean up queues
if (requests.length > 0) 
    db.mydata.bulkWrite(requests);

From MongoDB server version 2.6 you need to use the now deprecated Bulk API.

var bulk = db.mydata.initializeUnorderedBulkOp();
var count = 0;

db.collection.find({}, { "name": 1 }).forEach(function(doc) {
    var fileId = Number(doc.name.match(/\d+/g)[0]); 
    bulk.find({"_id": doc._id}).updateOne({ 
        "$set": { "fileId": fileId } 
    });
    count++;
    if (count % 1000 === 0) {
        bulk.execute();
        bulk = db.mydata.initializeUnorderedBulkOp();
    }
})

if (count > 0) 
    bulk.execute();

From MongoDB server version 2.4 onwards you need a different approach.

db.collection.find({}, { "name": 1 }).forEach(function(doc) {
    var fileId = Number(doc.name.match(/\d+/g)[0]); 
    db.collection.update(
        { "_id": doc._id },
        {"$set": { "fileId": fileId } } 
    );
})

After any of this operation, your documents now look like this:

{ "_id" : ObjectId("571e5a787e88d30b20b7857c"), "name" : "file1", "fileId" : 1 }
{ "_id" : ObjectId("571e5a8c7e88d30b20b7857d"), "name" : "file11", "fileId" : 11 }
{ "_id" : ObjectId("571e5a977e88d30b20b7857f"), "name" : "file2", "fileId" : 2 }
{ "_id" : ObjectId("571e5a937e88d30b20b7857e"), "name" : "file22", "fileId" : 22 }

Now, you can easily sort your documents using the .sort method.

db.mydata.find({}, { "name": 1 } ).sort( { "fileId": 1 } )

which produces the following result:

{ "_id" : ObjectId("571e5a787e88d30b20b7857c"), "name" : "file1" }
{ "_id" : ObjectId("571e5a977e88d30b20b7857f"), "name" : "file2" }
{ "_id" : ObjectId("571e5a8c7e88d30b20b7857d"), "name" : "file11" }
{ "_id" : ObjectId("571e5a937e88d30b20b7857e"), "name" : "file22" }
styvane
  • 59,869
  • 19
  • 150
  • 156
  • It's nice to understand more about bulk update operations. Introducing extra field(s) is the way to go if we can find out the pattern of sorted value. However, in this case, the value can be anything, like normal strings, IP Address, ... About the client side approach, it will be a problem if we do pagination. Unless you can return all the data to the client side (which might create performance issue), this approach will not produce expected result. – 6220119 May 02 '16 at 05:11
  • @6220119 As I said normalization is definitely the way to go because the client-side operation will cause a drop of performance in the application. Also finding a pattern should be a problem because the "name" holds value of same type. For example with regular string, you can sort you documents by length of the string in ascending order then alphabetically, which means that your extra field will hold the length. But this is definitely the solution to your problem. – styvane May 02 '16 at 06:25
  • Sorting by length and then alphabetically do not yield the correct result. See http://imgur.com/wPR39Mw for some inspiration. And look like this has become another question? How to use mongoDB to store string value for performant sorting operation? – 6220119 May 02 '16 at 17:10
  • Since poster states _name can be anything, there is no particular pattern_, the schema modification should split the `name` into `base` and `fileId`. Then sort accordingly: `sort({base:1, fileId:1})`. – zapatero May 03 '16 at 04:37