3

I am writing a search bar with an autocomplete feature that is hooked up to an endpoint. I am using regex to determine the "context" that I am in inside of the query I type in the search bar. The three contexts are "attribute," "value," and "operator." The two operators that are allowed are "AND" and "OR." Below is an example of an example query.

Color: Blue AND Size: "Women's Large" (<-- multi-word values or attribute names are surrounded by quotation marks)

I need my regex to match after you put a space after Blue, and if the user begins type "A/AN/AND/O/OR", I need it to match. Once they have put a space after the operator, I need it to stop matching.

This is the expression I have come up with.

const contextIsOperator = /[\w\d\s"]+: *[\w\s\d"]+ [\w]*$/

It matches once I put a space after "Blue," but matches for everything I put after that. If I replace the last * in the expression with a +, it works when I put a space after "Blue" and start manually typing one of the operators, but not if I just have a space after "Blue."

The pattern I have in my head written in words is:

  1. group of one or more characters/digits/spaces/quotation marks
  2. followed by a colon
  3. followed by an optional space
  4. followed by another group of one or more characters/digits/space/quotation marks
  5. followed by a space (after the value)
  6. followed by one or more characters (this is the operator)

How do I solve this problem?

Emma
  • 27,428
  • 11
  • 44
  • 69
Sean San
  • 43
  • 1
  • 5
  • When using code in a question, please put all code into code blocks: usually, indent all code lines by 4 spaces, or surround the code block with 3 backticks (`\`\`\``). For inline code on the same line as non-code, surround the code with a single backtick on each side (`\``). – CertainPerformance May 29 '19 at 22:15
  • You're probably better off doing this with a lexer + parser. But someone will eventually be able to help you with the regex. – Todd Chaffee May 29 '19 at 22:18
  • Lexer+parser may not be necessary. This is simple enough to model with a simple state machine, but I agree that a regex probably isn't the proper solution for this. – c1moore May 29 '19 at 22:20
  • 1
    It's not necessary to use both `\w` and `\d`, since digits are included in `\w`. – Barmar May 29 '19 at 22:21
  • 1
    When you say "characters" I think you mean "letters"? "characters" means any type of character. – Barmar May 29 '19 at 22:23
  • 2
    It's not matching because of the apostrophe in `Women's Large`. That's not a letter, digit, space, or quotation mark. – Barmar May 29 '19 at 22:25
  • After playing around with this a bit, you'll definitely want a state machine _or_ you could make the input more specific (e.g. always require quotes around value that appears after the quotes). As is, you're language is not deterministic enough for a regex. – c1moore May 29 '19 at 22:50

4 Answers4

1

Change [\w]* to something that just matches AND, OR, or one of their prefixes. Then you can make it optional with ?

[\w\s"]+: *[\w\s"]+ (A|AN|AND|O|OR)?$

DEMO

Note that Size: Women's Large won't match this because the apostrophe isn't in \w; that only matches letters, digits, and underscore. You'll need to add any other punctuation characters that you want to allow in these fields to the character set.

Barmar
  • 741,623
  • 53
  • 500
  • 612
  • This solution won't work even with the example that was provided (even taking out the apostrophe). `$` means the end of the string, so you specify that AND/OR is at the end of the string. Also, why do you have `(A|AN|AND|O|OR)`? That will match invalid operators (`AN`, `A`, `O`). – c1moore May 29 '19 at 22:47
0

Edit: this is the final one, check the unit tests here

const regex = /((("[\w\s"'']+(?="\b))"|[\w"'']+):\s?(("[\w\s"'']+(?="\b))"|[\w"'']+)\s(AND|OR)(?=\b\s))+/

That monstrosity should match (NOTE: QUOTED KEYS/VALUES MUST BE DOUBLE QUOTED):

Color: Blue AND "Size5":"Women's Large"
"weird KEy":regularvalue OR otherKey: "quoted value"
yaas
  • 144
  • 2
  • 7
  • This doesn't properly match the example he provided: https://regex101.com/r/g5EMqM/1 – c1moore May 29 '19 at 22:51
  • @c1moore I have fixed it, thanks! It turns out, the previous regex just needed an extra space at the end, but this one is **much** better. https://regex101.com/r/2THj9F/4/tests – yaas May 29 '19 at 23:29
0

Is as, your language is not deterministic enough to be properly modeled with a regex. That being said, there are 2 approaches you can take:

  1. Require all values (the stuff after a : and before an operator) to be enclosed in quotes
  2. Build a simple state machine that can parse the data more intelligently. (Google Finite State Machine Parser)

If you choose to use the first method, you can use the following regex:

^(("?[\w\s]+"?): ?("[\w\s']+")( (AND|OR) )?)+$

I would explain the different components, but regex101 already does for me with really good visuals and detail.

c1moore
  • 1,827
  • 17
  • 27
0

Here you go, try this out

^(?:"[^"]*"|[^\s:]+):[ ](?:"[^"]*"|[^\s:]+)[ ](?:A(?:N(?:D(?:[ ](*SKIP)(?!))?)?)?|O(?:R(?:[ ](*SKIP)(?!))?)?)?

https://regex101.com/r/neUQ0g/1

Explained

 ^                             # BOS
 (?:                           # Attribute
      "
      [^"]* 
      "
   |  
      [^\s:]+ 
 )
 :
 [ ] 
 (?:                           # Value
      "
      [^"]* 
      "
   |  
      [^\s:]+ 
 )
 [ ]                           # Start matching after Attribute: Value + space
 (?:                           # Operator
      A
      (?:
           N
           (?:
                D 
                (?:                           # Stop matching after 'AND '
                     [ ] 
                     (*SKIP) 
                     (?!)
                )?
           )?
      )?
   |  
      O 
      (?:
           R 
           (?:                           # Stop matching after 'OR '
                [ ] 
                (*SKIP)                    
                (?!)
           )?
      )?
 )?