1

I have a collection in MongoDB.(Technically it is the cosmosDB API for MongoDB but in theory that shouldn’t matter.) The collection contains nested arrays of strings. These strings contain brackets inside them, and I need to remove them from the strings inside the arrays. Sample (fake) data:

{
    “Thing_id”: “100”,
    “Thing_name”: “Thing100”
    “Comments”: [
        “Good: [Thing100 is awesome]”,
        “Bad: [I will never buy Thing100 again.]”
    ]
},
{
    “Thing_id”: “101”,
    “Thing_name”: “Thing101”
    “Comments”: [
        “Comparative: [Thing101 is so much better than Thing100.]”,
        “Bad: [Who designed such piece of …]”
    ]
}
]

Any above syntax errors are because I am typing it on my phone…my apologies if it is not well-formed json/bson. To recap, I want to remove just the brackets in each value in the comments arrays. I am using pymongo to connect to the DB, and also python 3.7.4.

I have searched for how to do this, but all I can find are how to update the whole value in the array instead of replacing a character in the string inside the array.

FindAndModify() looked promising until I found out it will only do the first document found.

Is the only option to query the collection and loop through each document?

rickhg12hs
  • 10,638
  • 6
  • 24
  • 42
compx
  • 15
  • 7
  • _"Technically it is the cosmosDB API for MongoDB but in theory that shouldn’t matter."_ Well, Azure Cosmos DB [_**"Fails 67% of compatibility tests."**_](https://www.mongodb.com/cloud/atlas/compare). Also see [Azure Cosmos DB feature support documentation](https://learn.microsoft.com/en-us/azure/cosmos-db/mongodb/feature-support-42). – rickhg12hs Jul 09 '23 at 23:44
  • What have you _tried_? This sounds like a relatively straightforward `update` using an aggregation pipeline and [string expression operators](https://www.mongodb.com/docs/manual/reference/operator/aggregation/#string-expression-operators). I'm sure there are questions on this site about performing similar types of updates – user20042973 Jul 10 '23 at 02:16
  • @user20042973 Ironically I think I started down that path except without a valid pipeline…so I can’t really call it an attempt since I knew it wouldn’t work. I went down multiple rabbit trails before ending up here. So now that I know I was likely heading in the right direction in the beginning, I will look into it further. I will share my solution when I figure it out. – compx Jul 10 '23 at 03:23
  • @rickhg12hs Thank you this is going to be extremely useful going forward. – compx Jul 10 '23 at 03:26

1 Answers1

2

I don't have access to Azure Cosmos DB, so I can't test. If you are using pymongo and want to update all documents in the collection, you probably want to use update_many.

Here are a couple parameters for update_many that may work for you.

filter = {'$expr': {'$gt': [{'$size': '$Comments'}, 0]}}
update = [
  {
    "$set": {
      # rewrite Comments
      "Comments": {
        "$map": {
          "input": "$Comments",
          "as": "comment",
          "in": {
            "$reduce": {
              "input": {
                # get all matches without "[" or "]"
                # every comment needs to be a string
                "$regexFindAll": {
                  "input": "$$comment",
                  # backslashes "\" are used to "escape" brackets
                  # the number of "\" required may depend on platform, etc.
                  "regex": "[^\\[\\]]*"
                }
              },
              "initialValue": "",
              "in": {
                # concat all matches
                "$concat": ["$$value", "$$this.match"]
              }
            }
          }
        }
      }
    }
  }
]

You can try this (slightly modified for the language/platform) on mongoplayground.net.

rickhg12hs
  • 10,638
  • 6
  • 24
  • 42