1

I have logged user's actions on the website like the following ones:

00:00 firstpage.textbox.hover
00:01 firstpage.textbox.push
00:02 firstpage.textbox.type
00:03 firstpage.textbox.submit

I am trying to find if there was a specific pattern of consequentive actions, e.g.

firstpage.textbox.type firstpage.textbox.submit

which may have any other actions before, between or after them.

What I have done so far is that I have concatenated actions to one string like:

firstpage.firstblock.open firstpage.textbox.push firstpage.textbox.type firstpage.textbox.submit

and then applied the following regular expression in order to find if there was a pattern I am interested in

select regexp_count(concatenated actions, '.*firstpage.textbox.type.*firstpage.textbox.submit.*') from t

Everyting is working fine, however I get the following error when trying to execute my query on data with more than about 50 items

Error running query: The regular expression provided for REGEXP_COUNT produced excessive matches. Rewrite the expression. code: 8002 context: The complexity of matching the regular expression exceeded predefined bounds. Try refactoring the regular expression to make each choice made by the state machine unambiguous.

I think the problem is because of backtracking in my regular expression like 'A.*B.*C'.

Do you know to change this regex in order to avoid this problem?

Sergei
  • 63
  • 1
  • 8
  • You might remove the first `.*` at least. – Wiktor Stribiżew Mar 15 '19 at 17:10
  • @WiktorStribiżew thank you, it increased the lower bound of the data length, however there still comes the same error – Sergei Mar 16 '19 at 18:47
  • I think, you can also remove the last '.*' and then make the mifddle no-greedy: 'firstpage.textbox.type.*?firstpage.textbox.submit' – Poul Bak Mar 16 '19 at 19:54
  • The last `.*` does not play any role here, it may remain. The second `.*` can be turned to lazy, but then it will make the pattern slower if the `C` part is closer to the end of the string. This is not the solution. Unroll-the-loop is. But the regex engine must support a lookahead because the right-hand pattern is a multicharacter string. I doubt this one does. – Wiktor Stribiżew Mar 16 '19 at 20:13

0 Answers0