I have this text:
A man’s jacket is of green color. He – the biggest star in modern history – rides bikes very fast (230 km per hour). How is it possible?! What kind of bike is he using? The semi-automatic gear of his bike, which is quite expensive, significantly helps to reach that speed. Some (or maybe many) claim that he is the fastest in the world! “I saw him ride the bike!” Mr. John Deer speaks. “The speed he sets is 133.78 kilometers per hour,” which sounds incredible; sounds deceiving.
I want to have the following resulting array:
words[1] = "A"
words[2] = "man's"
words[3] = "jacket"
...
words[n+1] = "color"
words[n+2] = "."
words[n+3] = "He"
words[n+4] = "-"
words[n+5] = "the"
...
This array should include all words and punctuation marks separately. Can that be performed using regexp? Can anyone help to compose it? Thanks!
EDIT: based on request to show my work. I'm processing the text using the following function, but I want to do the same in regex:
$text = explode(' ', $this->rawText);
$marks = Array('.', ',', ' ?', '!', ':', ';', '-', '--', '...');
for ($i = 0, $j = 0; $i < sizeof($text); $i++, $j++) {
$skip = false;
//check if the word contains punctuation mark
foreach ($marks as $value) {
$markPosition = strpos($text[$i], $value);
//if contains separate punctation mark from the word
if ($markPosition !== FALSE) {
//check position of punctation mark - if it's 0 then probably it's punctuation mark by itself like for example dash
if ($markPosition === 0) {
//add separate mark to array
$words[$j] = new Word($j, $text[$i], 2, $this->phpMorphy);
} else {
$words[$j] = new Word($j, substr($text[$i], 0, strlen($text[$i]) - 1), 0, $this->phpMorphy);
//add separate mark to array
$punctMark = substr($text[$i], -1);
$j += 1;
$words[$j] = new Word($j, $punctMark, 1, $this->phpMorphy);
}
$skip = true;
break;
}
}
if (!$skip) {
$words[$j] = new Word($j, $text[$i], 0, $this->phpMorphy);
}
}