9

I have this string:

"Common Waxbill - Estrilda astrild"

How can I write 2 separate regexes for the words before and after the hyphen? The output I would want is:

"Common Waxbill" 

and

"Estrilda astrild"
Oliver Oliver
  • 2,057
  • 4
  • 16
  • 14

4 Answers4

16

This is quite simple:

.*(?= - )     # matches everything before " - "
(?<= - ).*    # matches everything after " - "

See this tutorial on lookaround assertions.

Tim Pietzcker
  • 328,213
  • 58
  • 503
  • 561
  • 1
    When I used the code for matching everything after I get this error: SyntaxError: Invalid regular expression: /(?<= - ).*/: Invalid group. I'm using the Chrome extension Web Scraper which uses js for its regex. – Oliver Oliver May 12 '16 at 15:22
  • 3
    Javascript does not have lookbehinds. – Lucas Araujo May 12 '16 at 15:25
  • As of ECMAScript 2018, that restriction has been lifted. JS now supports lookbehind assertions, even those of indefinite length. – Tim Pietzcker May 15 '18 at 05:36
14

If you cannot use look-behinds, but your string is always in the same format and cannout contain more than the single hyphen, you could use

^[^-]*[^ -] for the first one and \w[^-]*$ for the second one (or [^ -][^-]*$ if the first non-space after the hyphen is not necessarily a word-character.

A little bit of explanation: ^[^-]*[^ -] matches the start of the string (anchor ^), followed by any amount of characters, that are not a hyphen and finally a character thats not hyphen or space (just to exclude the last space from the match).

[^ -][^-]*$ takes the same approach, but the other way around, first matching a character thats neither space nor hyphen, followed by any amount of characters, that are no hyphen and finally the end of the string (anchor $). \w[^-]*$ is basically the same, it uses a stricter \w instead of the [^ -]. This is again used to exclude the whitespace after the hyphen from the match.

Sebastian Proske
  • 8,255
  • 2
  • 28
  • 37
1

Another solution is to string split on the hyphen and remove white space.

Oliver Oliver
  • 2,057
  • 4
  • 16
  • 14
0

Two alternate methods

The main challenge of your Question is that you want two separate items. This means that your process is dependent on another language. RegEx itself does not parse or separate a string; it only explains what we are looking for. The language you are using will make the actual separation. My answer gets your results in PHP, but other languages should have comparable solutions.

If you want to just do the job in your Question, and if you're using PHP...

Method 1: explode("-", $list); -> $array[]

This is useful if your list is longer than two items:

<?php
// Generate our list
$list = "Common Waxbill - Estrilda astrild";
$item_arr = explode("-", $list);

// Iterate each
foreach($item_arr as $item) {
  echo $item.'<br>';
}

// See what we have
echo '
<pre>Access array directly:</pre>'.
'<pre>'.$item_arr[0].'x <--notice the trailing space</pre>'.
'<pre>'.$item_arr[1].' <--notice the preceding space</pre>';

...You could clean up each item and reassign them to a new array with trim(). This would get the text your Question asked for (no extra spaces before or after)...

// Create a workable array
$i=0; // Start our array key counter
foreach($item_arr as $item) {
  $clean_arr[$i++] = trim($item);
}

// See what we have
echo '
<pre>Access after cleaning:</pre>'.
'<pre>'.$clean_arr[0].'x <--no space</pre>'.
'<pre>'.$clean_arr[1].' <--no space</pre>';
?>

Output:

Common Waxbill

Estrilda astrild

Access array directly:

Common Waxbill x <--notice the trailing space

 Estrilda astrild <--notice the preceding space

Access after cleaning:

Common Waxbillx <--no space

Estrilda astrild <--no space

Method 2: substr(strrpos()) & substr(strpos())

This is useful if your list will only have two items:

<?php
// Generate our list
$list = "Common Waxbill - Estrilda astrild";

// Start splitting
$first_item = trim(substr($list, strrpos($list, '-') + 1));
$second_item = trim(substr($list, 0, strpos($list, '-')));

// See what we have
echo "<pre>substr():</pre>
<pre>$first_item</pre>
<pre>$second_item</pre>
";
?>

Output:

substr():

Estrilda astrild

Common Waxbill

Note strrpos() and strpos() are different and each have different syntax.

If you're not using PHP, but you want to do the job in some other language without depending on RegEx, knowing the language would be helpful.

Generally, programming languages come with tools for jobs like this out of box, which is part of why people choose the languages they do.

Jesse
  • 750
  • 1
  • 9
  • 25