MongoDB regex query to find unicode replacement character

Question

I am trying to manually fix some documents in my Mongo database which contain the Unicode replacement character (looks like a question mark, see http://www.fileformat.info/info/unicode/char/fffd/index.htm). I already fixed the issue why these characters ended up there but would like to keep the old data too. So all I want is a simple query which returns all documents containing this character.

What I came up with so far is

db.songs.find({artist: /\ufffd/});

to find all songs with an artist name containing the replacement character. No luck so far.

This is due to the fact that this is a representation when you cannot properly view the characters. Mongo handles UTF-8 properly so it is not likely that this is your data, it is your view on it. — Michael Papile, Sep 27 '11 at 00:15
Yes, I know that and this is my data because I had some encoding bug that ended up in some of these replacement characters. That's why I want to go through the data and replace those manually with the right character. And to do that, I would like to view all entries that I need to edit. — mbuchetics, Sep 27 '11 at 06:06

score 7 · Accepted Answer · edited Feb 28 '12 at 21:27

7

Seems it doesn't like \uXXXX in the regexp. Try:

db.songs.find({artist: new RegExp("\ufffd")});

edited Feb 28 '12 at 21:27

mu is too short

426,620
70
833
800

answered Sep 27 '11 at 07:08

pingw33n

12,292
2
37
38

I had to use `'` instead of `"` on a Mac but yay, this worked! – mbuchetics Sep 27 '11 at 07:15

score 1 · Answer 2 · answered Jun 15 '16 at 12:19

1

To bump an old thread :D for regex you need to escape the backslash otherwise it will escape the u instead:

db.songs.find({artist: /\\ufffd/});

answered Jun 15 '16 at 12:19

Philip Pryde

930
7
13

MongoDB regex query to find unicode replacement character

2 Answers2

Linked