MongoDB querying whitespace with regex

Question

I've got a large collection of text data stored in MondoDB that users can query via keyword or phrase, and have an issue where some data has unicode character U+00A0 (no-break space) instead of a regular space.

Fixing up the data not being an option (those nbsps are there intentionally), I still want the user to be able to search and find that data. So I updated our Mongo query-building code to search for any whitespace [\s] in places where the user entered a space, resulting in a query like so:

{ "tt" : { "$elemMatch" : { "x" : { "$regex" : "high[\s]performance" , "$options" : "i"} }}}

(there's more to the query, that's just the relevant bit).

Unfortunately, this doesn't return the expected results. So I play around with a bunch of other ways to accomplish this, and eventually discover that I get the correct results when I search for "not non-whitespace" [^\S], as so:

{ "tt" : { "$elemMatch" : { "x" : { "$regex" : "high[^\S]performance" , "$options" : "i"} }}}

Which leads to my question -- why does "any whitespace" ("\s") fail finding this text while "not-non whitespace" ("^\S") finds it successfully? Does Mongo have a different set of rules for what counts as whitespace and non-whitespace?

Data is all in UTF-8 throughout, MongoDB version is 2.2.2

score 6 · Accepted Answer · edited May 23 '19 at 14:30

6

I suppose that the problem here is with \, not with spaces. Can you please write \\ to prove my conjecture?

edited May 23 '19 at 14:30

Medet Tleukabiluly

11,662
3
34
69

answered Jan 20 '14 at 20:06

Igor Chubin

61,765
13
122
144

Yup, that was it -- just realized that my upstream code already had \\ but that only generated one \ in the query, I needed to build the query with "\\\\s" :D – devin Jan 20 '14 at 20:17

MongoDB querying whitespace with regex

1 Answers1