1

I am having trouble matching the pattern, "This program cannot be run" whenever the phrase is broken over multiple lines, e.g.:

This program cannot be run

T
his program cannot be run

Thi
s program cannot be run

.
.

This pr
ogram cannot be run

The pattern can be split onto two lines at any point. I have tried using /m and /s as well as anchors and boundaries but I cannot get it to work. I am at a loss as to what I am doing wrong. I even tried using \s after every character and even that won't match! The pattern must be PCRE formatted.

user2249813
  • 25
  • 1
  • 4

3 Answers3

4

s and m won't help you here. They only change the behavior of . and anchors, respectively. Anchors and boundaries won't help either, because they only assert that something is at a certain position.

The problem with all those approaches is that a line break introduces one or two new characters into the string (\n, \r or \r\n, depending on your system). Therefore, you will would have to allow a line break at any possible point if you need a regex only solution:

/T[\r\n]*h[\r\n]*i[\r\n]*s[\r\n]* [\r\n]*p[\r\n]*.../

And so on.

If you can modify the input, it would be easier to remove line breaks first by replacing

/[\r\n]+/

with an empty string and then running the pattern you already have.

Martin Ender
  • 43,427
  • 11
  • 90
  • 130
  • @MikeM fair enough. I'm trying really hard, but I don't see a way to ensure that that isn't either really hacky or gets a lot uglier than what we already have. – Martin Ender Apr 19 '13 at 18:44
  • Unfortunately, this has to be regex only so it'll have to be messy but I tested this out and it works! I wish there were better resources online explaining the m and s and how those affect . Thank you. – user2249813 Apr 19 '13 at 19:15
  • @user2249813 have you had a look at www.regular-expressions.info? specifically [this page](http://www.regular-expressions.info/modifiers.html) – Martin Ender Apr 19 '13 at 19:34
2

If a newline character can appear at any point in the sought substring, you will need to add a corresponding character to match that newline in the regex.

Assuming the newline characters are always \n

T\n?h\n?i\n?s\n? \n?p\n?r\n?o\n?g\n?r\n?a\n?m\n? \n?c\n?a\n?n\n?n\n?o\n?t\n? \n?b\n?e\n? \n?r\n?u\n?n
MikeM
  • 13,156
  • 2
  • 34
  • 47
  • I think I understand why /s and /m weren't applicable and indeed this solution is a bit messy but I think I understand why it has to be done like this. I can't get it to match. I am also testing it with an online regex tester (sanity check) and that too will not match it. HTTP/1.1 200 OK. Accept-Ranges: bytes. Cache-control: max-age=86400. Content-Type: application/octet-stream. Date: Fri, 19 Apr 2013 15:40:15 GMT. X-Cache: HIT. . MZ............@........................................!..L.!Th is program cannot be run in DOS mode... $........n.zY..)Y..)Y..).An)X..)6yh)B..)6y) – user2249813 Apr 19 '13 at 18:41
  • http://bokehman.com/regex_checker and http://www.regexplanet.com/advanced/java/index.html Sample data - http://pastebin.com/c3pFEGV0 – user2249813 Apr 19 '13 at 18:45
  • @user2249813. Works fine for me at that first link, with the content in the question or the pastebin content. – MikeM Apr 19 '13 at 18:51
  • @user2249813. Proof! [http://imgur.com/XyWFUdQ](http://imgur.com/XyWFUdQ). (It only works with the RAW paste data from pastebin). – MikeM Apr 19 '13 at 19:25
0

so it looks horrible, and maybe someone can offer a better solution, here it is in python using the re.S flag

>>> a = """
... This pr
... ogram cannot be run"""
>>> re.search("T[\n]*h[\n]*i[\n]*s[\n]* [\n]*p[\n]*r[\n]*o[\n]*",a,re.S)
<_sre.SRE_Match object at 0x7f9d746e9e68>

The easy way to make the regex if your string changes

>>> a = "This program cannot be run"
>>> b = list(a)
>>> r = '[\r\n]*'.join(b)
pyInTheSky
  • 1,459
  • 1
  • 9
  • 24