42

I'm trying to build a search feature for a project which narrows down items based on a user search input and if it matches the keywords listed against items. For this, I'm saving the item keywords in a data attribute and matching the query with these keywords using a RegExp pattern.

I'm currently using this expression, which I know is not correct and need your help on that:

new RegExp('\\b(' + query + ')', 'gi'))) where query is | separated values of the query entered by the user (e.g. \\b(meat|pasta|dinner)). This returns me a match even if there is only 1 match, say for example - meat

Just to throw some context, here's a small example:

If a user types: meat pasta dinner it should list all items which have ALL the 3 keywords listed against them i.e. meat pasta and dinner. These are independent of the order they're typed in.

Can you help me with an expression which will match ALL words in a query, in any order?

Chris Seymour
  • 83,387
  • 30
  • 160
  • 202
kayen
  • 4,838
  • 3
  • 19
  • 20
  • Umm, not exactly - I need to match the entire user input with the keywords against the items and only return true if the entire input matches the keywords on those items. – kayen Dec 17 '12 at 09:05
  • Basically I need to come up with an AND expression which will AND the user input `e.g. meat dinner pie` and check against the keywords `e.g. Pasta meat continental kebab pie simple`. In this case, the expression will fail as the keyword contains `meat` and `pie` but not `dinner`. – kayen Dec 17 '12 at 09:08
  • 1
    Why don't you use simple `string` method for this? I mean, if there are no repeated keywords, then this can be achieved easily with Regex. But with repeated keywords, you should better avoid Regex here. Just use `indexOf` function with a for-loop. Or, may be a better String function, that I might not be aware of. – Rohit Jain Dec 17 '12 at 09:15

4 Answers4

67

You can achieve this will lookahead assertions

^(?=.*\bmeat\b)(?=.*\bpasta\b)(?=.*\bdinner\b).+

See it here on Regexr

(?=.*\bmeat\b) is a positive lookahead assertion, that ensures that \bmeat\b is somewhere in the string. Same for the other keywords and the .+ is then actually matching the whole string, but only if the assertions are true.

But it will match also on "dinner meat Foobar pasta"

stema
  • 90,351
  • 20
  • 107
  • 135
9

stema's answer is technically correct, but it doesn't take performance into account at all. Look aheads are extremely slow (in the context of regular expressions, which are lightning fast). Even with the current logic, the regular expression is not optimal.

So here are some measurements, calculated on larger strings which contain all three words, running the search 1000 times and using four different approaches:

stema's regular expression

/^(?=.*\bmeat\b)(?=.*\bpasta\b)(?=.*\bdinner\b).+/

result: 605ms

optimized regular expression

/^(?=.*?\bmeat\b)(?=.*?\bpasta\b)(?=.*?\bdinner\b)/

uses lazy matching and doesn't need the end all selector

result: 291ms

permutation regular expression

/(\bmeat\b.*?(\bpasta\b.*?\bdinner\b|\bdinner\b.*?\bpasta\b)|\bpasta\b.*?(\bmeat\b.*?\bdinner\b|\bdinner\b.*?\bmeat\b)|\bdinner\b.*?(\bpasta\b.*?\bmeat\b|\bmeat\b.*?\bpasta\b))/

result: 56ms

this is fast because the first pattern is matching, if the last pattern matched, it would be even slower than the look ahead one (300 ms)

array of regular expressions

var regs=[/\bmeat\b/,/\bpasta\b/,/\bdinner\b/];
var result = regs.every(reg=>reg.test(text));

result: 26ms

Note that if the strings are crafted to not match, then the results are:

  • 521ms
  • 220ms
  • 161ms - much slower because it has to go through all the branches
  • 14ms

As you can see, in all cases just using a loop is an order of magnitude faster, not to mention easier to read.

The original question was asking for a regular expression, so my answer to that is the permutation regular expression, but I would not use it, as its size would grow exponentially with the number of search words.

Also, in most cases this performance issue is academic, but it is necessary to be highlighted.

Siderite Zackwehdex
  • 6,293
  • 3
  • 30
  • 46
5

your regex looks pretty good:

\b(meat|pasta|dinner)\b

Check that the length of matches equals the number of keywords (in this case, three):

string.match(re).length === numberOfKeywords

where re is the regex with a g flag, string is the data and numberOfKeywords is the number of keywords

This assumes that there are no repeated keywords.

ColBeseder
  • 3,579
  • 3
  • 28
  • 45
  • 2
    In this model the duplicates need to be removed too – closure Dec 17 '12 at 09:21
  • Thanks for the help! But yes, the keywords could repeat so I'm going ahead with stema's answer above. +1 for the logic though. – kayen Dec 17 '12 at 10:21
  • Also if you need condition for atleast 'X' items where 'X' denotes start range then you can use below regex (modified for atleast one item) \b(meat|pasta|dinner){1,3}\b – vCillusion Jan 31 '16 at 04:09
2

Based on the accepted answer I wrote a simple Java method that builds the regex from an array of keywords

public static String regexIfAllKeywordsExists(String[] keywords) {
    StringBuilder sb = new StringBuilder("^");

    for (String keyword : keywords) {
        sb.append("(?=.*\\b");
        sb.append(keyword);
        sb.append("\\b)");
    }

    sb.append(".+");

    return sb.toString();
}
prettyvoid
  • 3,446
  • 6
  • 36
  • 60