6

I need to split a string into two using a delimiter character. All I have to do is use the explode() function... I know.

But here is what I'm trying to do: I need to split a string using a delimiter but if the delimiter is enclosed in quotes it should be ignored.

Let's say my delimiter is a hyphen (-) and I need to split the following string:

The ‘big-yellow’ house-is near the lake

The first hyphen must be ignored because it is in quotes, therefore I would end up with two strings like these:

1. The ‘big-yellow’ house
2. is near the lake

And it also should be able to detect escaped quotes.

E.g.: He doesn\’t like it because-he isn\’t from here.

In this case the hyphen is not within quotes therefore the string should be split.

Any thoughts?

  • 1
    To skip escaped quotes, just combine the reference with a lookback `(?<!\\\\)'` – mario Feb 02 '16 at 18:13
  • 1
    To detect the `-` occurring in a word with quotes, I will suggest using `lookaround` assertions. Such that hyphen is followed and preceded by characters before **meeting** a _quote_. –  Feb 02 '16 at 18:14
  • This is not a duplicate, as the other question doesn't deal with escaped quotes at all. Also, the answers there don't have good explanations of how it works. – Will Feb 02 '16 at 18:27
  • 1
    I have made an [IDEONE](https://ideone.com/sjYmPm) demo. See if this is what you need ? –  Feb 02 '16 at 18:34
  • As the question has been closed, here an answer on [ideone](http://ideone.com/2upnoV) - you can use `‘[^’]+’(*SKIP)(*FAIL)|-` as a splitting delimiter. The trick is to specify everything that should **not** be found before the alternation (`|`). – Jan Feb 02 '16 at 18:35
  • noob and Jan: your regexes do not support escaped double quotes. Pablo, you have regular straight single quotes, not curly ones, right? – Wiktor Stribiżew Feb 02 '16 at 18:50
  • Yes Wiktor... I'm using only single quotes... –  Feb 02 '16 at 19:16
  • @PabloB: Try [`"~'[^'\\\\]*(?:\\\\.[^'\\\\]*)*'(*SKIP)(?!)|-~"`](https://regex101.com/r/eF8tA4/1). See [demo](http://ideone.com/rR1DvQ). – Wiktor Stribiżew Feb 02 '16 at 20:35

2 Answers2

2

You may use

'[^'\\]*(?:\\.[^'\\]*)*'(*SKIP)(?!)|-

See regex demo

The '[^'\\]*(?:\\.[^'\\]*)*' part will match single quotes and any escaped entities, and (*SKIP)(?!) will force the regex engine to go on searching for matches after the last index + match length.

And here is an IDEONE demo:

$re = "/'[^'\\\\]*(?:\\\\.[^'\\\\]*)*'(*SKIP)(?!)|-/"; 
$strs = array("The 'big-yellow' house-is near the lake", "He doesn\'t like it because-he isn\'t from here."); 
foreach ($strs as $str) {
    $result = preg_split($re, $str);
    print_r($result);
}

Output:

Array( [0] => The 'big-yellow' house [1] => is near the lake) and Array( [0] => He doesn\'t like it because [1] => he isn\'t from here.).

Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
1

May be something like this?

function fsplit($str, $delimiter)
{
    $result = array();
    $inside_quote = false;
    $last_index = 0;
    for($i=0; $i<strlen($str);$i++)
    {
        if($str[$i] == $delimiter and !$inside_quote)
        {
            array_push($result, substr($str, $last_index, $i - $last_index));
            $last_index = $i+1;
        }
        elseif($str[$i] == "'")
        {
            $inside_quote = !$inside_quote;
        }

    }

    return $result;
}
Vahagn
  • 386
  • 2
  • 21