2

Here is a example string:

$string = '<strong>Lorem ipsum dolor</strong> sit <img src="test.png" /> amet <span class="test" style="color:red">consec<i>tet</i>uer</span>.';

I want to split the string into array such that the string get split whenever a whitespace is hit or an html tag is hit (ignoring whitespace inside html tag). For Example:

Array
(
    [0] => <strong>
    [1] => Lorem
    [2] => ipsum
    [3] => dolor
    [4] => </strong>
    [5] => sit
    [6] => <img src="test.png" />
    [7] => amet
    [8] => <span class="test" style="color:red">
    [9] => consec
    [10] => <i>
    [11] => tet
    [12] => </i>
    [13] => uer
    [14] => </span>
    [15] => .
)

But i am unable to achieve this. I used preg_split to achieve this idea but i think i am mistaken in my regular expressions. Below are some expressions i tried but the results are not what i want.

$chars = preg_split('/(<[^>]*[^\/]>)/i', $string, -1, PREG_SPLIT_NO_EMPTY | PREG_SPLIT_DELIM_CAPTURE);

/* Results */

Array
(
    [0] => <strong>
    [1] => Lorem ipsum dolor
    [2] => </strong>
    [3] =>  sit <img src="test.png" /> amet 
    [4] => <span class="test" style="color:red">
    [5] => consec
    [6] => <i>
    [7] => tet
    [8] => </i>
    [9] => uer
    [10] => </span>
    [11] => .
)

and the result of other regular expression is:

$chars = preg_split('/\s+(?![^<>]*>)/x', $string, -1, PREG_SPLIT_NO_EMPTY | PREG_SPLIT_DELIM_CAPTURE);

/* Results */
Array
(
    [0] => <strong>Lorem
    [1] => ipsum
    [2] => dolor</strong>
    [3] => sit
    [4] => <img src="test.png" />
    [5] => amet
    [6] => <span class="test" style="color:red">consec<i>tet</i>uer</span>.
)

and the result of another expression is (quite close):

$chars = preg_split('/\s*(<[^>]*>)/i', $string, -1, PREG_SPLIT_NO_EMPTY | PREG_SPLIT_DELIM_CAPTURE);

/* Results */
Array
(
    [0] => <strong>
    [1] => Lorem ipsum dolor
    [2] => </strong>
    [3] =>  sit
    [4] => <img src="test.png" />
    [5] =>  amet
    [6] => <span class="test" style="color:red">
    [7] => consec
    [8] => <i>
    [9] => tet
    [10] => </i>
    [11] => uer
    [12] => </span>
    [13] => .
)
Armaan
  • 73
  • 3

1 Answers1

1

You're almost near to get it. But you need to change <[^>]*> to a more specific regex <\/?\w+[^<>]*> then you need to set an alternation for whitespaces |\s+. You don't need i flag either:

preg_split('/(<\/?\w+[^<>]*>)|\s+/', $string, null, PREG_SPLIT_NO_EMPTY | PREG_SPLIT_DELIM_CAPTURE)
revo
  • 47,783
  • 14
  • 74
  • 117