0

I have spent the last hour looking for replies but I haven't found any yet, so here I ask...

I need a way (certainly regex, but everything else like explode is fine) to cut a sentence like the following into parts, in the same array:

This is the first part, this is the second part; this is the third part! this is the fourth part? again - and again - until the sentence is over.

I want an array with the following entries (without the spaces following or preceding or not the punctuation marks, please):

  • [0] => "This is the first part"
  • [1] => "this is the second part"
  • [2] => "this is the third part"
  • [3] => "this is the fourth part"
  • [4] => "again"
  • [5] => "and again"
  • [6] => "until the sentence is over"

EDIT: Sorry, the following example is in English but it should be able to handle a whole variety of scripts (all of Unicode, basically).

Thanks a lot!

Julien
  • 27
  • 7
  • perhaps using preg_split('/\.!\?/',$mysentence) with a regexp using punctuation marks? – Mark Baker Oct 23 '13 at 09:08
  • sorry I tried but it doesn't seem to work (and also I have been so bad at figuring out regex that I don't know how to add punctuation marks in this) – Julien Oct 23 '13 at 09:20

3 Answers3

1

I found a solution here

Here is my approach to have exploded output with multiple delimiter.

<?php

//$delimiters has to be array
//$string has to be array

function multiexplode ($delimiters,$string) {

    $ready = str_replace($delimiters, $delimiters[0], $string);
    $launch = explode($delimiters[0], $ready);
    return  $launch;
}

$text = "here is a sample: this text, and this will be exploded. this also | this one too :)";
$exploded = multiexplode(array(",",".","|",":"),$text);

print_r($exploded);

//And output will be like this:
// Array
// (
//    [0] => here is a sample
//    [1] =>  this text
//    [2] =>  and this will be exploded
//    [3] =>  this also
//    [4] =>  this one too
//    [5] => )
// )

?>
Nauphal
  • 6,194
  • 4
  • 27
  • 43
  • Thanks, it works quite well but two problems remain to be solved: the spaces after (and certainly the spaces which *could* exist before) the punctuation marks remain, and the full stop creates an empty entry at the end of the array. – Julien Oct 23 '13 at 09:23
1

A single preg_split can do the job:

$s = 'This is the first part, this is the second part; this is the third part! this is the fourth part? again - and again - until the sentence is over.';
print_r(preg_split('/\s*[,:;!?.-]\s*/u', $s, -1, PREG_SPLIT_NO_EMPTY));

OUTPUT:

Array
(
    [0] => This is the first part
    [1] => this is the second part
    [2] => this is the third part
    [3] => this is the fourth part
    [4] => again
    [5] => and again
    [6] => until the sentence is over
)
anubhava
  • 761,203
  • 64
  • 569
  • 643
0

Try using this

$parts = preg_split("/[^A-Z\s]+/i", $string);
var_dump($parts);
Mina
  • 1,508
  • 1
  • 10
  • 11
  • Sorry I just edited the question - It should be able to handle much more than just the limited set of 26 letters. And the punctuation should also be Unicode punctuation... :/ – Julien Oct 23 '13 at 09:14