4

I want to find all key names from a collection that partially match a certain string.

The closest I got was to check if a certain key exists, but that's an exact match:

db.collection.find({ "fkClientID": { $exists:1 }})

I'd like to get all keys that start with fk instead.

Danziger
  • 19,628
  • 4
  • 53
  • 83
mskw
  • 10,063
  • 9
  • 42
  • 64

2 Answers2

4

You can do that using mapReduce:

To get just the field names at root level:

db.collection.mapReduce(function () {
    Object.keys(this).map(function(key) {
        if (key.match(/^fk/)) emit(key, null);

        // OR: key.indexOf("fk") === 0
    });
}, function(/* key, values */) {
    // No need for params or to return anything in the 
    // reduce, just pass an empty function.
}, { out: { inline: 1 }});

This will output something like this:

{
    "results": [{
        "_id": "fkKey1",
        "value": null
    }, {
        "_id": "fkKey2",
        "value": null
    }, {
        "_id": "fkKey3",
        "value": null
    }],
    "timeMillis": W,
    "counts": {
        "input": X,
        "emit": Y,
        "reduce": Z,
        "output": 3
    },
    "ok" : 1
}

To get field names and any or all (whole doc) its values:

db.test.mapReduce(function () {
    var obj = this;

    Object.keys(this).map(function(key) {
        // With `obj[key]` you will get the value of the field as well.
        // You can change `obj[key]` for:
        //  - `obj` to return the whole document.
        //  - `obj._id` (or any other field) to return its value.

        if (key.match(/^fk/)) emit(key, obj[key]);
    });
}, function(key, values) {
    // We can't return values or an array directly yet:

    return { values: values };
}, { out: { inline: 1 }});

This will output something like this:

{
    "results": [{
        "_id": "fkKey1",
        "value": {
            "values": [1, 4, 6]
        }
    }, {
        "_id": "fkKey2",
        "value": {
            "values": ["foo", "bar"]
        }
    }],
    "timeMillis": W,
    "counts": {
        "input": X,
        "emit": Y,
        "reduce": Z,
        "output": 2
    },
    "ok" : 1
}

To get field names in subdocuments (without path):

To do that you will have to use store JavaScript functions on the Server:

db.system.js.save({ _id: "hasChildren", value: function(obj) {
    return typeof obj === "object";
}});

db.system.js.save({ _id: "getFields", value: function(doc) {
    Object.keys(doc).map(function(key) {
        if (key.match(/^fk/)) emit(key, null);

        if (hasChildren(doc[key])) getFields(doc[key])
    });
}});

And change your map to:

function () {
    getFields(this);
}

Now run db.loadServerScripts() to load them.

To get field names in subdocuments (with path):

The previous version will just return field names, not the whole path to get them, which you will need if what you want to do is rename those keys. To get the path:

db.system.js.save({ _id: "getFields", value: function(doc, prefix) {
    Object.keys(doc).map(function(key) {
        if (key.match(/^fk/)) emit(prefix + key, null);

        if (hasChildren(doc[key]))
            getFields(doc[key], prefix + key + '.')
    });
}});

And change your map to:

function () {
    getFields(this, '');
}

To exclude overlapping path matches:

Note that if you have a field fkfoo.fkbar, it will return fkfoo and fkfoo.fkbar. If you don't want overlapping path matches, then:

db.system.js.save({ _id: "getFields", value: function(doc, prefix) {
    Object.keys(doc).map(function(key) {
        if (hasChildren(doc[key]))
            getFields(doc[key], prefix + key + '.')
        else if (key.match(/^fk/)) emit(prefix + key, null);
    });
}});

Going back to your question, renaming those fields:

With this last option, you get all the paths that include keys that start with fk, so you can use $rename for that.

However, $rename doesn't work for those that contain arrays, so for those you could use forEach to do the update. See MongoDB rename database field within array

Performance note:

MapReduce is not particularly fast thought, so you may want to specify { out: "fk_fields"} to output the results into a new collection called fk_fields and query those results later, but that will depend on your use case.

Possible optimisations for specific cases (consistent schema):

Also, note that if you know that the schema of your documents is always the same, then you just need to check one of them to get its fields, so you can do that adding limit: 1 to the options object or just retrieving one document with findOne and reading its fields in the application level.

Community
  • 1
  • 1
Danziger
  • 19,628
  • 4
  • 53
  • 83
  • The failing here is the return would be one document per matched key name rather than identifying the documents that have a key that matches. Also I think you missed the intent of the `.limit(0)` in the question. Since `.limit(0)` actually does not limit results at all. – Neil Lunn May 19 '17 at 22:36
  • @NeilLunn Yes, I know. I have updated my answer to be able to retrieve just the field names, both field name and all values or field name and all full documents. The `.limit(1)` was just a suggestion in case the schema is consistent across all the documents. – Danziger May 19 '17 at 23:14
  • 1
    Actually If you read the OP's comments on my own answer here, it turns out that a single reduced list of the field names is all they are actually after. – Neil Lunn May 19 '17 at 23:18
3

If you have the latest MongoDB 3.4.4 then you can use $objectToArray in an aggregate statement with $redact as the the most blazing fast way this can possibly be done with native operators. Not that scanning the collection is "fast". but as fast as you get for this:

db[collname].aggregate([
  { "$redact": {
    "$cond": {
      "if": {
        "$gt": [
          { "$size": { "$filter": {
            "input": { "$objectToArray": "$$ROOT" },
            "as": "doc",
            "cond": {
              "$eq": [ { "$substr": [ "$$doc.k", 0, 2 ] }, "fk" ]
            }
          }}},
          0
        ]
      },
      "then": "$$KEEP",
      "else": "$$PRUNE"
    }
  }}
 ])

The presently undocumented $objectToArray translates an "object" into "key" and "value" form in an array. So this:

{ "a": 1, "b": 2 }

Becomes this:

[{ "k": "a", "v": 1 }, { "k": "b", "v": 2 }]

Used with $$ROOT which is a special variable referring to the current document "object", we translate to an array so the values of "k" can be inspected.

Then it's just a matter of applying $filter and using $substr to get the preceding characters of the "key" string.


For the record, this would be the MongoDB 3.4.4 optimal way of obtaining an unique list of the matching keys:

db[collname].aggregate([
  { "$redact": {
    "$cond": {
      "if": {
        "$gt": [
          { "$size": { "$filter": {
            "input": { "$objectToArray": "$$ROOT" },
            "as": "doc",
            "cond": {
              "$eq": [ { "$substr": [ "$$doc.k", 0, 2 ] }, "fk" ]
            }
          }}},
          0
        ]
      },
      "then": "$$KEEP",
      "else": "$$PRUNE"
    }
  }},
  { "$project": { 
    "j": {
      "$filter": {
        "input": { "$objectToArray": "$$ROOT" },
         "as": "doc",
         "cond": {
           "$eq": [ { "$substr": [ "$$doc.k", 0, 2 ] }, "fk" ]
         }
      }
    }
  }},
  { "$unwind": "$j" },
  { "$group": { "_id": "$j.k" }}
 ])

That's the safe provision, which is considering that the key may not be present in all documents and that there could possibly be multiple keys in the document.

If you are absolutely certain that you "always" have the key present in the document and that there will only be one, then you can shorten to just $group:

db[colname].aggregate([
  { "$group": { 
    "_id": {
      "$arrayElemAt": [
        { "$map": {
          "input": { "$filter": {
            "input": { "$objectToArray": "$$ROOT" },
             "as": "doc",
             "cond": {
               "$eq": [ { "$substr": [ "$$doc.k", 0, 2 ] }, "fk" ]
             }
          }},
          "as": "el",
          "in": "$$el.k"
        }},
        0
      ]
    }
  }}
 ])

The most efficient way in earlier versions would be using the $where syntax that allows a JavaScript expression to evaluate. Not that anything that evaluates JavaScript is the "most" efficient thing you can do, but analyzing "keys" as opposed to "data" is not optimal for any data store:

db[collname].find(function() { return Object.keys(this).some( k => /^fk/.test(k) ) })

The inline function there is just shell shorthand and this could also be written as:

db[collname].find({ "$where": "return Object.keys(this).some( k => /^fk/.test(k) )" })

The only requirement for $where is that the expression returns a true value for any document you want to return, so the documents return unaltered.

Neil Lunn
  • 148,042
  • 36
  • 346
  • 317
  • It says operator not found for $objectToArray when I tried it. – mskw May 19 '17 at 22:46
  • db[collname].find({ "$where": "return Object.keys(this).some( k => /^fk/.test(k) )" }) Works however – mskw May 19 '17 at 22:46
  • Follow up on that, how do project just the found "fk" key that I found instead of the entire document? – mskw May 19 '17 at 22:47
  • @mskw As noted in the answer, you need the very latest [MongoDB 3.4.4 release](https://docs.mongodb.com/manual/release-notes/3.4/) where the `$obectjToArray` operator was added. So new there is not a manual page yet. What are you looking for actually? Returning all the documents that match but "only" the fields that match the expression? – Neil Lunn May 19 '17 at 22:51
  • 1
    I'm trying to create a list of all keys that contain fk. (foreign keys), then next step is to replace fk with pk (primary key) and find the doc that contain the pk. Then replace it with an OBJectID. Basically, I had a MDB MSAccess and need to convert it to mongo db. Is a pain to do it by hand, so I'm devising a script, and this is part one of that script. – mskw May 19 '17 at 22:56
  • 2
    @mskw If all you **need** is to extract the unique list of key names as starting with the characters, and you are **not** using MongoDB 3.4.4 then the best bet is `.mapReduce()`. The `$where` clause cannot alter document structure, nor can you "project" based on the pattern of a key name. As for replacing the values, then you will need an individual update statement for each possible key name. There are optimal ways to do that, but that's another question. – Neil Lunn May 19 '17 at 23:02
  • @mskw Well I knew that already since I did test the code of course. If you are simply after just the unique list of matching properties, then I also added listings to do just that. As noted before, it is preferable in performance terms ( and possibly in terms of the returned cursor if there are a lot of results ) to run the aggregate statement as opposed to executing JavaScript on the server, as `$where` or `.mapReduce()` does. – Neil Lunn May 20 '17 at 04:24
  • Awesome answer, but I actually don't understand what the new query you added is supposed to be doing. – mskw May 21 '17 at 04:24
  • There are many operators I don't know about that you've used, so I will need time to understand them. – mskw May 21 '17 at 04:27