0

I havent been able to figure this one out.

I need to match all those strings by matching whole and its surroundings underscores (in one regex statement):

  1. whole_anything
  2. anything_whole
  3. anything_whole_anything

but it must NOT match this

  1. anythingwholeanything
  2. anything_wholeanything
  3. anythingwhole_anything

That means... make a regex statement, that match phrase whole only if it has underscore before, after or both. Not if there are no underscores.

The following

preg_match("/(whole_|_whole_|_whole)/",string)

is not a solution ;)

2015/02/09 Edit: added conditions 5. and 6. for clarification

Lukáš Řádek
  • 1,273
  • 1
  • 11
  • 23
  • 1
    Why isn't that a solution? Doesn't it work? – Barmar Feb 07 '15 at 00:30
  • 1
    `/(\b|_)whole(\b|_)/` – Jonathan Kuhn Feb 07 '15 at 00:30
  • @JonathanKuhn That will match when there's no underscore before or after. – Barmar Feb 07 '15 at 00:30
  • @Barmar Yes, it matches if there are no anything, but that isn't one of the tets cases. – Jonathan Kuhn Feb 07 '15 at 00:32
  • The questions says _not if there are no underscores_. At least one of the underscores is required. – Barmar Feb 07 '15 at 00:32
  • Can you match just "whole"? Or must it have something before and/or after? – Jon Surrell Feb 07 '15 at 00:32
  • Soory for being silent for a while. @Barmar ... that is not a solution, because this is a challenge and because when I want to change the keyword("whole"), I must change it 3 times. The "whole" keyword is a "flag" inside a file name and must not be mistaken with another words in the filename that could contain sequence "whole"... so the "whole" keyword must be either separated by underscore from other chars or on the beginning/end of the string(filename) – Lukáš Řádek Feb 09 '15 at 19:54
  • What do you mean "this is a challenge"? It's a contest of some kind, not a real programming problem that you're having? – Barmar Feb 09 '15 at 19:58
  • It is a problem, yes, but of course there are many ways (known to me) to go around that. But since I really enjoy regex and this was giving me a headache, I was wondering if some bright guy on SO can figure it out :) – Lukáš Řádek Feb 09 '15 at 20:12

3 Answers3

4

You could reduce the number of cases in the alternatives:

preg_match('/(_whole_?|whole_)/', $string);

If there's an underscore before, the underscore after is optional. But if there's no underscore before, the underscore after is required.

You can use a PHP variable to solve the problem of putting the word twice:

$word = preg_quote('whole');
preg_match("/(_{$word}_?|{$word}_)/", $string);
Barmar
  • 741,623
  • 53
  • 500
  • 612
  • Thanks for the input, but ideally i would like pattern with "whole" used only once :) and furthermore, this pattern would fail in cases 5. a 6., that I added for clarification. I know, they wasn't there by the time you answered, so sorry for that. I forgot to mention those. – Lukáš Řádek Feb 09 '15 at 20:01
  • I don't understand why cases 5 and 6 should fail. They each have underscore before or after. Is there an unstated requirement that if the character before or after is not an underscore, it has to be a word boundary? – Barmar Feb 09 '15 at 20:26
  • Cases 5 and 6 weren't there when I wrote the answer. I was just going by your description of the requirements. – Barmar Feb 09 '15 at 20:27
2

You could exclude all alphanumeric characters prior to and after. Unfortunately you can't use \w because _ is considered a word character

([^a-zA-Z0-9])_?whole_?([^a-zA-Z0-9])

That will exclude alphanumeric before and after from matching, and the underscore in front, behind, or both, is optional. If none exist, it can't match because it can'be proceeded by a letter or number. You could change it to include special characters and the lot.

Ohgodwhy
  • 49,779
  • 11
  • 80
  • 110
  • You should probably use negative lookahead and lookbehind, so the non-alphanumeric characters aren't included in the match. This will also handle the case where the underscore is at the beginning or end of the line. – Barmar Feb 07 '15 at 00:36
  • @Barmar My RegEx Foo is not that strong. Look(ahead|behind)'s aren't my strongest suite...unfortunately. – Ohgodwhy Feb 07 '15 at 00:49
  • thanks for the input... But if I use that, would it match anything_whole_anything? I guess not, because, there are no alphanum chars allowed right? – Lukáš Řádek Feb 09 '15 at 20:05
  • I give it a try and almost works. Does not capture "whole_anything" though. But I really like that the "whole" is only once here :) – Lukáš Řádek Feb 09 '15 at 20:19
2

Another alternative. This way we check for the existence of a word boundary or _ both before and after whole, but we exclude the word whole by itself through a negative lookahead.

(?!\bwhole\b)((?:_|\b)whole(?:_|\b))

Regex Demo here.

David Faber
  • 12,277
  • 2
  • 29
  • 40