1

I want to extract two consecutive words starting from each word in a string.

$string = "This is my test case for an example."

If I explode on each space, I get every word individually, but I don't want that.

[
    'This',
    'is',
    'my',
    'test',
    'case',
    'for',
    'an',
    'example.'
];

What I want is to get each word and its next word including the delimiting space.

Desired output:

[
    'This is'
    'is my'
    'my test'
    'test case'
    'case for'
    'for an',
    'an example.'
]
mickmackusa
  • 43,625
  • 12
  • 83
  • 136
azzy81
  • 2,261
  • 2
  • 26
  • 37
  • [Split string on every second space to isolate every two words](https://stackoverflow.com/q/840807/2943403) is not a duplicate. It demonstrates how to split on every second space to show consecutive pairs of words (showing no duolicate words in the result. This question requires "overlap". The same word will be used as the second word of the previous element and as the first word of the next element. – mickmackusa Oct 22 '22 at 02:25

4 Answers4

3

this will provide the output you're looking for

$string = "This is my test case for an example.";
$tmp = explode(' ', $string);
$result = array();
//assuming $string contains more than one word
for ($i = 0; $i < count($tmp) - 1; ++$i) {
    $result[$i] = $tmp[$i].' '.$tmp[$i + 1];
}
print_r($result);

Wrapped in a function:

function splitWords($text, $cnt = 2) 
{
    $words = explode(' ', $text);

    $result = array();

    $icnt = count($words) - ($cnt-1);

    for ($i = 0; $i < $icnt; $i++)
    {
        $str = '';

        for ($o = 0; $o < $cnt; $o++)
        {
            $str .= $words[$i + $o] . ' ';
        }

        array_push($result, trim($str));
    }

    return $result;
}
farinspace
  • 8,422
  • 6
  • 33
  • 46
Sylvain
  • 3,202
  • 5
  • 27
  • 27
2

An alternative, making use of 'chasing pointers', would be this snippet.

$arr = explode( " ", "This is an example" );
$result = array();

$previous = $arr[0];
array_shift( $arr );
foreach( $arr as $current ) {
    $result[]=$previous." ".$current;
    $previous = $current;
}

echo implode( "\n", $result );

It's always fun to not need indices and counts but leave all these internal representational stuff to the foreach method (or array_map, or the like).

xtofl
  • 40,723
  • 12
  • 105
  • 192
0

A short solution without loops (and a variable word count):

    function splitStrByWords($sentence, $wordCount=2) {
        $words = array_chunk(explode(' ', $sentence), $wordCount);
        return array_map('implode', $words, array_fill(0, sizeof($words), ' '));
    }
  • returns n word segments, but does not back-paddle to the previous words – farinspace Jan 25 '11 at 05:15
  • This snippet simply doesn't run in modern PHP. https://3v4l.org/RKFkV After adjusting the implode parameters, it gives the wrong result. https://3v4l.org/iphLT – mickmackusa May 12 '23 at 18:51
-1

The only single function call approach to directly generate the desired output involves a capture group inside of a lookahead called by preg_split() (a simpler pattern could be made for preg_match_all()`, but it generates a 2d array instead of a 1d array.

Code: (Demo)

var_export(
    preg_split(
        "/(?=(\S+ \S+))\S+ (?:\S+$)?/",
        $string,
        0,
        PREG_SPLIT_DELIM_CAPTURE | PREG_SPLIT_NO_EMPTY
    )
);

preg_match() version: (Demo)

preg_match_all(
    "/(?=(\S+ \S+))\S+ /",
    $string,
    $m
);
var_export($m[1]);

Or because you only have words and spaces: Demo

preg_match_all(
    "/(?=(\b\w+ \w+))/",
    $string,
    $m
);
var_export($m[1]);

Using the same splitting regex as demonstrated in Split string on every second space to isolate every two words will work if you pre-inject the duplicated words which are lead and trailed by a space.

Code: (Demo)

var_export(
    preg_split(
        "/\S+ \S+\K /",
        preg_replace(
            '/(?<= )(\S+ )\K/',
            '$1',
            $string
        )
    )
);

Or use explode() with a static variable to hold the previous word for the next iteration inside of array_reduce(). (Demo)

var_export(
    array_reduce(
        explode(' ', $string),
        function($result, $word) {
            static $last = '';
            if ($last) {
                $result[] = "$last $word";
            }
            $last = $word;
            return $result;
        }
    )
);

Or a classic loop over the exploded string while holding the previous iteration's word. (Demo)

$result = [];
$last = null;
foreach (explode(' ', $string) as $word) {
    if ($last) {
        $result[] = "$last $word";
    }
    $last = $word;
}
var_export($result);

Or explode then append and unset data: (Demo)

$result = explode(' ', $string);
foreach ($result as $i => $word) {
    if (isset($result[$i + 1])) {
        $result[$i] .= " {$result[$i + 1]}";
    } else {
        unset($result[$i]);
    }
}
var_export($result);

Basic for() loop with n-1 iterations (won't display a lone word string): (Demo)

$words = explode(' ', $string);
$result = [];
for ($i = 1, $max = count($words); $i < $max; ++$i) {
    $result[] = $words[$i - 1] . ' ' . $words[$i];
}
var_export($result);
mickmackusa
  • 43,625
  • 12
  • 83
  • 136