1

I am confused about how the regex engine deals with literals after repetition. I was reading this. http://www.regular-expressions.info/print.html

So it talks about about matching double-quoted string.

Suppose you want to match a double-quoted string. Sounds easy. We can have any number of any character between the double quotes, so «".*"» seems to do the trick just fine. The dot matches any character, and the star allows the dot to be repeated any number of times, including zero.

Now how does the regex know when to stop? Won't it reach the end of file? My analysis: I thought of 2 possible ways how this works.

METHOD 1: The regex will find a quotation, then it will keep matching any character until the end of the file(or the line) .Then it will 'realize' there is no quotation, so it will go back to the previous permutation of .*(stopping at the last character), and keep going backward until a double-quote is matched. the .* sounds like a bad idea if this method is true.

OR Method 2: The regex will match a double-quote, then it will keep matching any character until it reaches a double-quote. I think this is unlikely since the book implies otherwise.

Of course, I can test the methods to see which one is actual method, but there maybe a totally different method.

side note... I ,of course, care about understanding how the regex engine deals with X or Y because that makes you better at using it(just like understanding how closures work), and because I don't feel good about using things I don't understand(typical developer).

  • Have a look at how PCRE handles it https://regex101.com/r/f0krs0/1/debugger ... and compare with a slightly different regex https://regex101.com/r/f0krs0/2/debugger or https://regex101.com/r/f0krs0/3/debugger. Internal optimizations aside, that's pretty much what you got. It will depend however on which regex engine in particular you're talking about, e.g. some might use NFAs instead of backtracking. Some interesting related reading https://swtch.com/~rsc/regexp/regexp1.html – user3942918 Jul 10 '18 at 04:13
  • fair enough, the swtch html sounds interesting, I'll read it. – Waleed Dahshan Jul 10 '18 at 04:28

0 Answers0