1

confused with some basic regex logic. Using simple example:

(one)(two)(three)

I want regex to catch:

onetwothree
onetwo
   twothree

but NOT

   two

and catching in groups (one)(two)(three).

I know I can use positive lookahead on 'two' so that it is only preceded by 'one':

(one)?((?<=one)two)(three)? 

but then I cannot get the 'twothree' result

The real world need is for currency:

group one:   [$¥£€₹]
group two:   ((?:\d{1,10}[,. ])*\d{1,10})
group three: ( ?(?:[$¥£€₹]|AUD|USD|GBP|EURO?S?\b))

so I want to get these results:

$20,000 AUD
$20,000
 20,000 AUD

but NOT

 20,000

help appreciated!

PS need matches in groups (one)(two)(three) or (one)(two) or (two)(three).

Benedict Harris
  • 184
  • 1
  • 9

3 Answers3

2

Here's a pure regex response:

(ONE)?TWO(?(1)(THREE)?|THREE)

Using conditionals, you can check to see if the first group matched, and if it did, then you can make the last group mandatory.

(ONE)?TWO(?(1)(THREE)?|THREE)
^     ^  ^^^^^   ^       ^
1     2    3     4       5

1: Try to match ONE. If you can't find it, no big deal.
2: You absolutely must have TWO.
3: If the first group DID match (ONE), then...
4: ... Use the first result
5: Otherwise use the second result

With this, we just make the first result be optional, so if we match ONE, then THREE is optional. If we miss ONE, then THREE is mandatory.

ONE
TWO
THREE
ONETWO       // Matches! (e.g: $20,000)
ONETHREE
TWOTHREE     // Matches! (e.g: 20,000 AUD)
ONETWOTHREE  // Matches! (e.g: $20,000 AUD)

Try it online!

Read more about conditional regex patterns in PHP here.

Addison
  • 7,322
  • 2
  • 39
  • 55
  • thats great!! thanks. I modified it a bit to get the capture groups capturing. `(ONE)?(TWO)((?(1)(?:THREE)?|THREE))` and updated your regex101 [here](https://regex101.com/r/HUfyTt/2) – Benedict Harris Mar 23 '21 at 14:36
  • what does "pure regex" mean? – eis Mar 24 '21 at 12:37
  • @eis - It just means I'm not using a mix of Regex and PHP to get a solution. It's just Regex, so it should also work for other people (even if they're using Python, Java, etc) – Addison Mar 25 '21 at 05:17
1

Honestly, I would just discard any lookaheads/lookbehinds and just define all cases separately and then combine them. It is more rebust, easier to reason with and understand and more effective.

So do

(^(groupone)(grouptwo)$)|(^(groupone)(grouptwo)(groupthree)$)|(^(grouptwo)(groupthree)$)

For example:

$groupone    = '[$¥£€₹]';
$grouptwo    = '(?:\d{1,10}[,. ])*\d{1,10}';
$groupthree  = ' ?([$¥£€₹]|AUD|USD|GBP|EURO)';
$caseone     = "^($groupone)($grouptwo)$";
$casetwo     = "^($groupone)($grouptwo)($groupthree)$";
$casethree   = "^($grouptwo)($groupthree)$";
$allcases    = "/($caseone)|($casetwo)|($casethree)/";

preg_match($allcases, '20,000 AUD', $matches);
print_r($matches); // matches, preg_match returns 1

preg_match($allcases, '$20,000', $matches);
print_r($matches); // matches, preg_match returns 1

preg_match($allcases, '$20,000 AUD', $matches);
print_r($matches); // matches, preg_match returns 1

preg_match($allcases, '20,000', $matches);
print_r($matches); // empty, preg_match returns 0

To make results look nicer (skip empty results, duplicates, extra whitespaces etc) I'd additionally use a cleanup function:

<?php
$groupone    = '[$¥£€₹]';
$grouptwo    = '(?:\d{1,10}[,. ])*\d{1,10}';
$groupthree  = ' ?([$¥£€₹]|AUD|USD|GBP|EURO)';
$caseone     = "^($groupone)($grouptwo)$";
$casetwo     = "^($grouptwo)($groupthree)$";
$casethree   = "^($groupone)($grouptwo)($groupthree)$";
$allcases    = "/($caseone)|($casetwo)|($casethree)/";

function cleanup($arr) {
  # trim trailing, ending whitespace
  $newarr = array_map('trim', $arr);
  # remove empty values
  $newarr = array_filter($newarr, function($value) { return $value !== ''; });
  # remove duplicates
  $newarr = array_unique($newarr);
  # we're only interested about the values
  return array_values($newarr);
}

preg_match($allcases, '20,000 AUD', $matches);
print_r(cleanup($matches));

preg_match($allcases, '$20,000', $matches);
print_r(cleanup($matches));

preg_match($allcases, '$20,000 AUD', $matches);
print_r(cleanup($matches));

preg_match($allcases, '20,000', $matches);
print_r(cleanup($matches));

Which would get you results like

Array
(
    [0] => 20,000 AUD
    [1] => 20,000
    [2] => AUD
)
Array
(
    [0] => $20,000
    [1] => $
    [2] => 20,000
)
Array
(
    [0] => $20,000 AUD
    [1] => $
    [2] => 20,000
    [3] => AUD
)
Array
(
)

Edit: if you want the groups to be the same, you can use named groups like

$groupone    = '(?<currencyprefix>[$¥£€₹])';
$grouptwo    = '((?:\d{1,10}[,. ])*\d{1,10})';
$groupthree  = ' ?(?<currencypostfix>([$¥£€₹]|AUD|USD|GBP|EURO))';
$caseone     = "^$groupone$grouptwo$";
$casetwo     = "^$grouptwo$groupthree$";
$casethree   = "^$groupone$grouptwo$groupthree$";
$allcases    = "/(?J)($caseone)|($casetwo)|($casethree)/";

function cleanup($arr) {
  $currencyprefix = isset($arr['currencyprefix']) ? $arr['currencyprefix'] : null;
  $currencypostfix = isset($arr['currencypostfix']) ? $arr['currencypostfix'] :null;

  return array($currencyprefix, $currencypostfix);
}

if (preg_match($allcases, '20,000 AUD', $matches))
  print_r(cleanup($matches));

if (preg_match($allcases, '$20,000', $matches))
  print_r(cleanup($matches));

if (preg_match($allcases, '$20,000 AUD', $matches))
  print_r(cleanup($matches));

if (preg_match($allcases, '20,000', $matches))
  print_r(cleanup($matches));

Which would get you

Array
(
    [0] =>
    [1] => AUD
)
Array
(
    [0] => $
    [1] =>
)
Array
(
    [0] => $
    [1] => AUD
)

Or, use named keys in end results too:

$groupone    = '(?<currencyprefix>[$¥£€₹])';
$grouptwo    = '((?:\d{1,10}[,. ])*\d{1,10})';
$groupthree  = ' ?(?<currencypostfix>([$¥£€₹]|AUD|USD|GBP|EURO))';
$caseone     = "^$groupone$grouptwo$";
$casetwo     = "^$grouptwo$groupthree$";
$casethree   = "^$groupone$grouptwo$groupthree$";
$allcases    = "/(?J)($caseone)|($casetwo)|($casethree)/";

function cleanup($arr) {
  $newarr = array_filter($arr, function($var){ return !empty($var); });
  return array_filter($newarr, "is_string", ARRAY_FILTER_USE_KEY);
}

if (preg_match($allcases, '20,000 AUD', $matches))
  print_r(cleanup($matches));

if (preg_match($allcases, '$20,000', $matches))
  print_r(cleanup($matches));

if (preg_match($allcases, '$20,000 AUD', $matches))
  print_r(cleanup($matches));

if (preg_match($allcases, '20,000', $matches))
  print_r(cleanup($matches));

With results:

Array
(
    [currencypostfix] => AUD
)
Array
(
    [currencyprefix] => $
)
Array
(
    [currencyprefix] => $
    [currencypostfix] => AUD
)
eis
  • 51,991
  • 13
  • 150
  • 199
  • ahh thanks!! another method. Yes I see how its clean without empy key values, however , in your example above, [2] => AUD in the first array...and in the 3rd. its [3] => AUD - I would need AUD consistent with the same extracted group. I wanted "AUD" to go into its own group consistent along results – Benedict Harris Mar 24 '21 at 05:46
  • @BenedictHarris ok, added to my answer – eis Mar 24 '21 at 08:19
0

You could use an alternation with an optional group.

\bonetwo(?:three)?|twothree\b

Regex demo

An example with named capture groups and the J flag to allow duplicate subpattern names:

(?P<symbol>[$¥£€₹])(?P<amount>\d{1,10}(?:[.,]\d{1,10})?)(?:\h+(?P<currency>AUD|USD|GBP|EURO?S?)\b)?\b|(?P<amount>\d{1,10}(?:[.,]\d{1,10})?)\h+(?P<currency>AUD|USD|GBP|EURO?S?)\b

Regex demo | Php demo

$strings = [
    "$20,000 AUD",
    "$20,000",
    "20,000 AUD",
    "$",
    "20,000",
    "AUD"
];
$re = '/(?P<symbol>[$¥£€₹])(?P<amount>\d{1,10}(?:[.,]\d{1,10})?)(?:\h+(?P<currency>AUD|USD|GBP|EURO?S?)\b)?\b|(?P<amount>\d{1,10}(?:[.,]\d{1,10})?)\h+(?P<currency>AUD|USD|GBP|EURO?S?)\b/J';
foreach ($strings as $s) {
    $m = preg_match($re, $s, $matches);
    if ($m) {
        print_r($matches);
    }
}

Output

Array
(
    [0] => $20,000 AUD
    [symbol] => $
    [1] => $
    [amount] => 20,000
    [2] => 20,000
    [currency] => AUD
    [3] => AUD
)
Array
(
    [0] => $20,000
    [symbol] => $
    [1] => $
    [amount] => 20,000
    [2] => 20,000
)
Array
(
    [0] => 20,000 AUD
    [symbol] => 
    [1] => 
    [amount] => 20,000
    [2] => 
    [currency] => AUD
    [3] => 
    [4] => 20,000
    [5] => AUD
)

Or see an example without the numerical keys which will give

Array
(
    [symbol] => $
    [amount] => 20,000
    [currency] => AUD
)
Array
(
    [symbol] => $
    [amount] => 20,000
)
Array
(
    [symbol] => 
    [amount] => 20,000
    [currency] => AUD
)
The fourth bird
  • 154,723
  • 16
  • 55
  • 70