Aggregation for counting the occurrence of sub-string in main string in mongodb

Question

I am new to MongoDB and might be its a noob question.

I want to count the number of times "lupoK" repeated in the message field which is - "message" : "first lupoK lupoK" using aggregation in MongoDB, I am using studio3t interface.

My document structure is -

{ 
    "_id" : ObjectId("5df9c780b05196da93be262b"), 
    "id" : "61a4c53a-aa99-4336-ab4f-07bb7f618889", 
    "time" : "00:00:45", 
    "username" : "siul", 
    "message" : "***first lupoK lupoK***", 
    "emoticon_place" : [
        {
            "_id" : "128428", 
            "begin" : NumberInt(6), 
            "end" : NumberInt(10)
        }
    ], 
    "fragments" : [
        {
            "text" : "first "
        }, 
        {
            "emoticon" : {
                "emoticon_id" : "128428", 
                "emoticon_set_id" : ""
            }, 
            "text" : "***lupoK***"
        },
        {
            "emoticon" : {
                "emoticon_id" : "128428", 
                "emoticon_set_id" : ""
            }, 
            "text" : "***lupoK***"
        }
    ]
}

Thanks in advance!!!

prasad_ · Accepted Answer · 2019-12-18T15:19:19.483

This works in mongo shell (assuming the message field is a string and exists):

db.test.aggregate( [
  { 
      $project: { 
          _id: 0, 
          message: 1, 
          count: { 
              $subtract: [ 
                  { $size: { $split: [ "$message", "lupoK" ] } }, 1 
              ] 
          } 
      } 
  }
] )

NOTES:

The $split operation splits the message string based on a delimiter - in this case the delimiter is "lupoK". The split returns an array of tokens which are separated by "lupoK". So, the number of tokens minus 1, gives the number of times "lupoK" is used, the count of occurrence of "lupoK".

Check the result with these sample message strings:

"***first lupoK lupoK***"
"lupoKlupoK"
" lupoK lupoK "
""
"lupoKlupoKlupoK"
"lupoK"
"HELLO * lupoK* WORLD"
"HELLO WORLD"
"***first lupoK lupoKlupoK lupoK***lupoK *** last lupoK."

For example, the tokens for some strings:

"***first lupoK lupoK***" generates these three tokens : [ "***first", " ", "***" ]
"HELLO * lupoK* WORLD" has these two tokens: [ "HELLO * ", "* WORLD" ]
"***first lupoK lupoKlupoK lupoK***lupoK *** last lupoK." has seven tokens: [ "***first ", " ", "", " ", "***", " ***last ", "." ]

Can you please elaborate it because I am not able to understand it what you have done. I was trying this - { $group : { _id : "$_id", fragments:{ "$text":"$text"}, total : { "$sum" : 1 }} } — Bipin Mishra, Dec 18 '19 at 14:41
Thanks a lot for your solution. It's helpful for my problem statement :) — Bipin Mishra, Dec 18 '19 at 14:53
Another way is: _Count the "fragments" array sub-document elements that has the fileld "text" with "lupoK" within it._; this can be converted to aggregation. — prasad_, Dec 18 '19 at 15:28

Aggregation for counting the occurrence of sub-string in main string in mongodb

1 Answers1