1

In a certain ID field we are indexing on a document looks like this: 1234 45676

We want to be able to do fulltext searches on each of the 2 groups of numbers, just as if they are strings. I escape the number groups in quotes, which the mongo documentation says will ensure that the entire string will be searched for.

For example, if an indexed field has the word "blue" in it, only the word "blue" will be searched for. Searching on "b" will not yield a hit. (we are using non-stemmatic searching for the time being).

But that is not the result with the number groups. Even though we escape our number groups with quotes ("45676"), the number groups are subjected to wildcard searches. In our example, searching on "4" will hit on "45676".

How can we ensure that "45676" is treated as a string that will yield a hit only if "45676" is searched for?

All suggestions or perspectives are welcome! Thanks in advance.

Ben Weaver
  • 960
  • 1
  • 8
  • 18

1 Answers1

1

There are two solutions for searching for a group of numbers as a unique single word.

1) Use the $text operator and text index

2) Use the $regexp operator or a regular expression.

Setup:

db = connect("test"); // same as `use test;`
db.a.drop();
db.a.insert([
    { _id: 1, txt : "Log 1: Page 23 1234 45676" },
    { _id: 2, txt : "Log 2: Page 45 0000 00000" },
    { _id: 3, txt : "Log 3: Page 59 1337 11111" }
]);

1. Example using the $text operator

Index the searchable field

db.a.ensureIndex({ txt : "text" });

Query using the $text operator

db.a.find({ $text : { $search : "45" } });

Output

{ _id: 2, txt : "Log 2: Page 45 0000 00000" }

Notice that output doesn't return the doc with _id 1, even though it contains 45676.

2. Example using a regular expression

For the regular expression, you need to wrap the numbers in a word boundary, \b, to avoid them being matched within a string.

Example: Searching for 4 without word boundary.

/4/.test("4") == true
/4/.test("1234") == true

Searching for 4 with word boundary.

/\b4\b/.test("4") == true
/\b4\b/.test("1234") == false

Search for 45 using the regular expression

db.a.find({ txt : /\b45\b/ });

Output

{ _id: 2, txt : "Log 2: Page 45 0000 00000" }

You can form a regular expression from user's input with the following functions.

function escapeRegExp(str) {
  return String(str).replace(/[[\]/{}()*+?.\^$|-]/g, "\\$&");
}
function wordToRegExp( query ){
  return new RegExp( "\\b" + escapeRegExp( query ) + "\\b" );
}
var queryForWord = wordToRegExp( 45 );
// queryForWord would be sent from your server side, not created in mongo shell.

db.a.find({ txt : queryForWord });

More info:

Community
  • 1
  • 1
Larry Battle
  • 9,008
  • 4
  • 41
  • 55
  • Thank you--I am now closer to understanding the problem. It does not lie with numbers. The problem is that our indexed fields contain more than 1 word, for example, "My Wonderful Product". As long as the field contains a single word or char group, for example, "MyWonderfulProduct", then all is well--a query term of, say, "Wond" will not produce a hit . But in the case of our field set to "My Wonderful Product" each term is wildcarded: a query term of "Wond" will produce a hit. – Ben Weaver Jan 28 '15 at 20:12
  • @BenWeaver See if the function wordToRegExp helps solve your problem. – Larry Battle Jan 28 '15 at 20:23
  • Right--I am checking wordToRegExp out now, and will reproduce it in our clojure code. Thanks! – Ben Weaver Jan 28 '15 at 21:16