3

I am trying to manually fix some documents in my Mongo database which contain the Unicode replacement character (looks like a question mark, see http://www.fileformat.info/info/unicode/char/fffd/index.htm). I already fixed the issue why these characters ended up there but would like to keep the old data too. So all I want is a simple query which returns all documents containing this character.

What I came up with so far is

db.songs.find({artist: /\ufffd/});

to find all songs with an artist name containing the replacement character. No luck so far.

mbuchetics
  • 1,370
  • 1
  • 17
  • 34
  • This is due to the fact that this is a representation when you cannot properly view the characters. Mongo handles UTF-8 properly so it is not likely that this is your data, it is your view on it. – Michael Papile Sep 27 '11 at 00:15
  • Yes, I know that and this is my data because I had some encoding bug that ended up in some of these replacement characters. That's why I want to go through the data and replace those manually with the right character. And to do that, I would like to view all entries that I need to edit. – mbuchetics Sep 27 '11 at 06:06

2 Answers2

7

Seems it doesn't like \uXXXX in the regexp. Try:

db.songs.find({artist: new RegExp("\ufffd")});
mu is too short
  • 426,620
  • 70
  • 833
  • 800
pingw33n
  • 12,292
  • 2
  • 37
  • 38
1

To bump an old thread :D for regex you need to escape the backslash otherwise it will escape the u instead:

db.songs.find({artist: /\\ufffd/});

Philip Pryde
  • 930
  • 7
  • 13