1

I have used the following regular expression to search and match augmented assignment operators:

AUG_ASSIGN = r'\+=|\-=|\*=|@=|/=|%=|/@=|&=|\|=|\^=|\<\<=|\>\>=|\*\*=|//='

I have a hunch that it is possible to eliminate the multiple occurrences of '=' using positive look-ahead or positive look-behind. So, my first search was the grammar section in Python reference manual. Even there, '=' is occurring multiple times:

augop:           "+=" | "-=" | "*=" | "/=" | "%=" | "**="
               | ">>=" | "<<=" | "&=" | "^=" | "|="

Understandable, as standard BNF does not include positive look-ahead or look-behind.

Next, I went through the suggestions given by SO itself, as I typed this question and my searches did not get me near my intended meeting point. Is it possible to shrink the expression to something like:

AUG_ASSIGN = <set of all binary operators>(?=\=)
Aran-Fey
  • 39,665
  • 11
  • 104
  • 149
Seshadri R
  • 1,192
  • 14
  • 24
  • 1
    ...why not just `(\+|-|\*|@|/|%|...)=`? No lookarounds required. – Aran-Fey Oct 19 '18 at 06:21
  • `(?:[-+*/@&%^|]|[<>/*]{2})=` – anubhava Oct 19 '18 at 06:33
  • Beware, if you try to do serious parsing simple regexes are not enough. Think of the following: `txt = "An augmented assignment example is a += b". As it is enclosed in a litteral string (could even be a multi-line one...), it shall not be parsed. – Serge Ballesta Oct 19 '18 at 07:46
  • And anyway, BNF if perfectly able to do that without look-ahead nor look-behind tricks `simple_op: "+" | "-" | "*" | "/" | "%" | "**" | ">>" | "<<" | "&" | "^" | "|"` `augop: simple_op "="` – Serge Ballesta Oct 19 '18 at 07:54

1 Answers1

2

Put the single characters in a character set, and alternate with the other options which have two of the same character, so you can put them in a character set too (inside a group so that the same character can be matched again with a backreference). Put all of the above in a group, and end with =:

(?:[+\-*@&/%^|^]|([*/><])\1)=

https://regex101.com/r/JA84zS/4

No lookaround needed.

CertainPerformance
  • 356,069
  • 52
  • 309
  • 320
  • Nice. It is a new learning to me that '*' does not have to be escaped, as your expression works perfectly as endorsed by regex101.com. Will wait for other answers before accepting yours. However, I have passed my appreciation to your response. – Seshadri R Oct 19 '18 at 06:31
  • 1
    It's only because it's in a character set - *most* characters don't have to be escaped in a character set. (I think the only exceptions are the string's delimiters, and backslashes) – CertainPerformance Oct 19 '18 at 06:33
  • 1
    And the range indicator '-', if it is to be seen as minus sign. Or, not? @anubhava has given a comment without it and it's working fine – Seshadri R Oct 19 '18 at 06:41
  • 1
    Doesn't this regex also captures extra unintended strings like *>= >*= *<= <*= etc. so to be little more precise we could use this (?:[+\-*@&/%^|]|>>|<<|\*\*|//)= instead? – Pushpesh Kumar Rajwanshi Oct 19 '18 at 07:19
  • 1
    @PushpeshKumarRajwanshi Ah, good point, thanks. Another way to do it, keeping the code DRY, would be to capture the character set in a group, and backreference it. https://regex101.com/r/JA84zS/4 – CertainPerformance Oct 19 '18 at 07:22