1

I have a collection with multiple date type fields. I know I can change them based on their key, but is there a way to find all fields that have date as a type and change all of them in one script?

UPDATE

Many thanks to chridam for helping me out. Based upon his code I came up with this solution. (Note: I have mongo 3.2.9, and some code snippets from chridam's answer just wouldn't run. It might be valid but it didn't work for me.)

map = function() {
    for (var key in this) { 
        if (key != null && this[key] != null && this[key] instanceof Date){ 
            emit(key, null); 
        }
    }
}

collectionName = "testcollection_copy";

mr = db.runCommand({
    "mapreduce": collectionName,
    "map": map,  
    "reduce": function() {},
    "out": "map_reduce_test" // out is required
}) 

dateFields = db[mr.result].distinct("_id")
printjson(dateFields)

//updating documents
db[collectionName].find().forEach(function (document){
   for(var i=0;i<dateFields.length;i++){
       document[dateFields[i]] = new NumberLong(document[dateFields[i]].getTime());
   } 
   db[collectionName].save(document);
});

Since projection didn't work, I used the above code for updating the documents. My only question is why to use bulkWrite?

(Also, getTime() seemed better than substracting dates.)

S0m30n3
  • 334
  • 2
  • 15
  • Can you give an example? – chridam Jan 04 '17 at 07:38
  • @chridam Yes, of course. Imagine a collection with fields like this: loginName (string), password (string), validFrom (date), validTo(date), registerDate (date). I'd like to write a script that finds all date fields (so in this case validFrom, validTo, registerDate) and changes them to timestamps/longs. If it's possible. – S0m30n3 Jan 04 '17 at 08:26

1 Answers1

1

An operation like this would involve two tasks; one to get a list of fields with the date type via MapReduce and the next to update the collection via aggregation or Bulk write operations.

NB: The following methodology assumes all the date fields are at the root level of the document and not embedded nor subdocuments.

MapReduce

The first thing you need is to run the following mapReduce operation. This will help you determine if each property with every document in the collection is of date type and returns a distinct list of the date fields:

// define helper function to determine if a key is of Date type
isDate = function(dt) {
    return dt && dt instanceof Date && !isNaN(dt.valueOf());
}

// map function
map = function() {
    for (var key in this) { 
        if (isDate(value[key]) 
            emit(key, null); 
    }
}

// variable with collection name
collectionName = "yourCollectionName";

mr = db.runCommand({
    "mapreduce": collectionName,
    "map": map,  
    "reduce": function() {}
}) 

dateFields = db[mr.result].distinct("_id")
printjson(dateFields)

//output: [ "validFrom", "validTo", "registerDate"" ]

Option 1: Update collection via aggregation framework

You can use the aggregation framework to update your collection, in particular the $addFields operator available in MongoDB version 3.4 and newer. If your MongoDB server version does not support this, you can update your collection with the other workaround (as described in the next option).

The timestamp is calculated by using the $subtract arithmetic aggregation operator with the date field as minuend and the date since epoch new Date("1970-01-01") as subtrahend.

The resulting documents of the aggregation pipeline are then written to the same collection via the $out operator thus updating the collection with the new fields.

In essence, you'd want to end up running the following aggregation pipeline which converts the date fields to timestamps using the above algorithm:

pipeline = [
    {
        "$addFields": {
            "validFrom": { "$subtract": [ "$validFrom", new Date("1970-01-01") ] },
            "validTo": { "$subtract": [ "$validTo", new Date("1970-01-01") ] },
            "registerDate": { "$subtract": [ "$registerDate", new Date("1970-01-01") ] }
        }
    },
    { "$out": collectionName }
]
db[collectionName].aggregate(pipeline)

You can dynamically create the above pipeline array given the list of the date fields as follows:

var addFields = { "$addFields": { } },
    output = { "$out": collectionName };

dateFields.forEach(function(key){
    var subtr = ["$"+key, new Date("1970-01-01")];
    addFields["$addFields"][key] = { "$subtract": subtr };
});

db[collectionName].aggregate([addFields, output])

Option 2: Update collection via Bulk

Since this option is a workaround when $addFields operator from above is not supported, you can use the $project pipeline to create the new timestamp fields with the same $subtract implementation but instead of writing the results to the same collection, you can iterate the cursor from the aggregate results using forEach() method and with each document, update the collection using the bulkWrite() method.

The following example shows this approach:

ops = []
pipeline = [
    {
        "$project": {
            "validFrom": { "$subtract": [ "$validFrom", new Date("1970-01-01") ] },
            "validTo": { "$subtract": [ "$validTo", new Date("1970-01-01") ] },
            "registerDate": { "$subtract": [ "$registerDate", new Date("1970-01-01") ] }
        }
    }
]

db[collectionName].aggregate(pipeline).forEach(function(doc) {
    ops.push({
        "updateOne": {
            "filter": { "_id": doc._id },
            "update": {
                "$set": { 
                    "validFrom": doc.validFrom,
                    "validTo": doc.validTo,
                    "registerDate": doc.registerDate
                }
            }
        }
    });

    if (ops.length === 500 ) {
        db[collectionName].bulkWrite(ops);
        ops = [];
    }
})

if (ops.length > 0)  
    db[collectionName].bulkWrite(ops);

Using the same method as Option 1 above to create the pipeline and the bulk method objects dynamically:

var ops = [],
    project = { "$project": { } },

dateFields.forEach(function(key){
    var subtr = ["$"+key, new Date("1970-01-01")];
    project["$project"][key] = { "$subtract": subtr };
});

setDocFields = function(doc, keysList) { 
    setObj = { "$set": { } };
    return keysList.reduce(function(obj, key) {  
        obj["$set"][key] = doc[key];
        return obj;
    }, setObj )
}

db[collectionName].aggregate([project]).forEach(function(doc) {
    ops.push({
        "updateOne": {
            "filter": { "_id": doc._id },
            "update": setDocFields(doc, dateFields)
        }
    });

    if (ops.length === 500 ) {
        db[collectionName].bulkWrite(ops);
        ops = [];
    }
})

if (ops.length > 0)  
    db[collectionName].bulkWrite(ops);
chridam
  • 100,957
  • 23
  • 236
  • 235
  • Thank you very very very much for the detailed answer. I finally had time to test your code. It helped me a lot, but also I had some problems with it. I have mongo version 3.2.9, I don't know how much it matters. The problems I had are the following: mr needs an out parameter. (It can easily be helped.) isDate is not defined -> so I just put the whole checking to the map function. value is not defined -> I don't know why, but value[key] simply doesn't exist. I used this[key] instead. – S0m30n3 Jan 06 '17 at 19:05
  • I'll update my question soon, with the code I used and with a question regarding it. I'd be glad if you could check it. – S0m30n3 Jan 06 '17 at 19:09
  • The `mapReduce` part is meant to run on the server commandline (in mongo shell). However, if you run that from an application then you need to declare `var mr, isDate` etc – chridam Jan 06 '17 at 19:11
  • I ran the scripts in Robomongo, so that might have been the problem. Thanks. – S0m30n3 Jan 06 '17 at 19:34