1

I'm constructing a regex query using a for loop as follows:

query = '^(?:'
for num, element in enumerate(my_list):
  values = element.split('-')
  if num > 0:
    query += '|'
  for v in values:
    query += f'(?=.*{v})'
query += ')'

my_list = ['a-b', 'b-c']

However, I believe this is inefficient as this results in a quadratic runtime so may not scale well.

How would I be able to convert this process to avoid this inefficiency?

I'm thinking something related to the .join() method but I'm confused how this would work in my case as it doesn't seem straightforward to me.

philipxy
  • 14,867
  • 6
  • 39
  • 83
Ricardo Francois
  • 752
  • 7
  • 24
  • "Quadratic runtime" relative to what? – Scott Hunter Jul 14 '21 at 18:41
  • 1
    There is one missing opening parenthesis – azro Jul 14 '21 at 18:43
  • Added the parenthesis, thanks @azro – Ricardo Francois Jul 14 '21 at 18:51
  • @ScottHunter I was told this method has a quadratic runtime cost for the total sequence length – Ricardo Francois Jul 14 '21 at 18:51
  • Please describe what this is trying to do. You will end up with `(?=.*a}(?=.*b)|(?=.*b)(?=.*c)`, which is a rather expensive regex, and since those are "non-consuming" matches, I can't really tell what you're trying to achieve. – Tim Roberts Jul 14 '21 at 18:57
  • @TimRoberts feel free to refer to https://stackoverflow.com/questions/68379545/how-to-check-if-all-items-in-a-group-are-contained-in-a-string-for-many-items if you're interested in what the method itself is trying to achieve – Ricardo Francois Jul 14 '21 at 18:58
  • Are you talking about the cost of *building* the regex or *using* it? B/c building it looks linear (there are N values in `my_list`, and all appear once). – Scott Hunter Jul 14 '21 at 19:04
  • Building the query itself using loops as this results in a repeated concatenation. As the sequences are immutable, that results in a new object each time. I believe a `.join()` approach would be more efficient but I'm not sure how to implement it – Ricardo Francois Jul 14 '21 at 19:05
  • Yes, it has quadratic behavior. If you want to check whether a string has "a" and "b" in any order, it is much better to do `if 'a' in str and 'b' in str:`. The post you cited used regex because SQL's string searching abilities aren't as rich as Python. – Tim Roberts Jul 14 '21 at 19:17
  • @TimRoberts: so *using* it is quadratic, not *building* it. – Scott Hunter Jul 14 '21 at 19:19
  • Yes, of course. Building it is linear. – Tim Roberts Jul 14 '21 at 19:21
  • Well, technically the append takes longer as the string gets longer, but unless you have tens of thousands of case, that won't matter, and if you do, the regex will never finish anyway. ;) – Tim Roberts Jul 14 '21 at 19:30

1 Answers1

0

This will mimic the above using joins; note that this will have no effect on the performance of using the query generated.

'^?:' + "|".join(["".join(['(?=.*%s)' % v for v in element.split('-')]) for element in my_list]) + ")"
Scott Hunter
  • 48,888
  • 12
  • 60
  • 101