0

I often find myself hoping to check a regex string for something else BEFORE something occurs. For example: - I want to match on strings that begin with a period, contain the word "car" and then contain a semicolon before the next period.

In the past I have solved for this by splitting the range of allowable characters and putting the desired string in the middle. So for the above example the regex might be:

    \..{1,35}?car.{1,35}?\.

But this solution is undesirable because I don't REALLY want to allow 75 characters between the 2 periods. I really only want to allow 50 characters. But since I don't know if the word car will occur at the very beginning or the very end, I have to find some compromise of what would be tolerable before AND after and settle on (in this example) 35 characters on both sides.

This will ultimately be in Python, but I'm hoping the principle can be explained in a flavor-neutral fashion.


People have suggested that with this simple example, I should just check the string's length afterwards.

This will not work for my needs because I want to look for multiple strings within the string. So for example I want to look for:

Some string beginning with [.;][\s"']{1,4} and reaching a period within 150 characters but before reaching that period finding (in no particular order) at least 1 colon, CAR and a literal slash and than after that period I want to find the word PineApple within 100 characters.

In an example like this, dropping in and out of regex to check string length would be burdensome. I'm not trying to force regex to do something it can't. I'm just asking if there is a way to get it working.

(Again not looking for someone to post a regex to achieve the result above, instead looking for suggestions of how to use regex to solve the question asked before the edit).

nhahtdh
  • 55,989
  • 15
  • 126
  • 162
COMisHARD
  • 867
  • 3
  • 13
  • 36
  • Maybe replace the `{1,35}` at each end with a simple `.+` and then check the length of the resulting match? – TigerhawkT3 Sep 02 '15 at 23:30
  • I bet that would work. But how do "check the length of the resulting match?" I could obviously do that outside of regex, but is there a regex way? – COMisHARD Sep 02 '15 at 23:31
  • Is there any other solution? – COMisHARD Sep 02 '15 at 23:35
  • 2
    `if len(m) <= 50` is going to be approximately 6.22 times clearer to read than any regex solution for the same (if you can find one). – TigerhawkT3 Sep 02 '15 at 23:39
  • @karakfa its the opposite in this case. I check for many many different matches using regex basically one after another and this is just one of them. Given my broader needs for this question it is not practical to pop the result out. But I'm struggling to give a simplified example to explain why. In short, there are multiple things I need to check for not just one. – COMisHARD Sep 02 '15 at 23:40
  • 1
    If you need to check the length of a string by using a regular expression rather than with a `len()` call, you may be facing an [XY Problem](http://meta.stackexchange.com/questions/66377/what-is-the-xy-problem). – TigerhawkT3 Sep 02 '15 at 23:41
  • After you've checked the string for all the components you describe, do you want to check whether the whole match is below a certain length? Because if you want that, you just find the match and then check its length. Beyond that, I don't know exactly what total string lengths you need to check and why. Maybe you need to capture groups and check their lengths? – TigerhawkT3 Sep 03 '15 at 00:00

1 Answers1

1

How to use regex lookahead to limit the total length of input string

>>> import re
>>> p = re.compile(r'^(?!.{16,})(\..{1,35}?car.{1,35}?\.)+$')
>>> p.match('.1car1.')
<_sre.SRE_Match object at 0x100a776c0>
>>> len('.1car123456789.')
15
>>> p.match('.1car123456789.')
<_sre.SRE_Match object at 0x100a70be8>
>>> p.match('.1car1234567890.')
Community
  • 1
  • 1
bmhkim
  • 754
  • 5
  • 16