3

Here is my code:

$str = "this is a test"
$arr = explode(' ', $str);
/* output:
array (
    0 => "this",
    1 => "is",
    2 => a,
    3 => test
)

All I'm trying to do is adding this condition to the explode() function:

if the word of a is followed by the word of test, then consider them as one word.

So this is expected output:

/* expected output:
array (
    0 => "this",
    1 => "is",
    2 => a test
)

In other word, I want something like this: /a[ ]+test|[^ ]+/. But I cannot use mentioned pattern as an alternative for explode() function. Because in reality, there is lots of bipartite-words which I need to care about. I mean there is an array of words which I want to be considered as one word:

$one_words("a test", "take off", "go away", "depend on", ....);

Any idea?

stack
  • 10,280
  • 19
  • 65
  • 117
  • 1
    The last sentence of your question confuses me. It sounds like `preg_split` is what you want up until then. – Marty Mar 15 '17 at 05:22
  • downvoter, please leave a comment and explain what's wrong with my question? – stack Mar 15 '17 at 05:22

3 Answers3

5

You can use implode to join all the reserve words and use it in preg_match_all like this:

$str = "this is a test";
$one_words = array("a test", "take off", "go away", "depend on");

preg_match_all('/\b(?:' . implode('|', $one_words) . ')\b|\S+/', $str, $m); 
print_r($m[0]);

Output:

Array
(
    [0] => this
    [1] => is
    [2] => a test
)

Regex we're using is this:

\b(?:' . implode('|', $one_words) . ')\b|\S+

For the given values in your array it will effective be this:

\b(?:a test|take off|go away|depend on)\b|\S+

This is basically capturing given words in array or any non-space words using \S+

anubhava
  • 761,203
  • 64
  • 569
  • 643
  • How will this behave if the array contains something like `let's party!`? Probably not likely but just in case. – Marty Mar 15 '17 at 05:37
  • `let's party` will also be captured separately like `a test` in this example. – anubhava Mar 15 '17 at 05:40
  • How will this behave if the array grows to 1000 items? Also `$str` be containing 1 million words? *(will it be fast enough?)* – stack Mar 15 '17 at 05:43
  • I haven't done an benchmarking on huge inputs because question was all about using condition in explode/split. I would say that any regex based solution or string search routine tend to get bit slow when searching in tera bytes of data. – anubhava Mar 15 '17 at 05:47
  • 1
    I see, thank you, upvote .. just what do you think about my answer? – stack Mar 15 '17 at 05:56
1

You can split the string per <space> and then join them as expected. Something like this:

$str = "this is a test";
$one_words = array("a test", "take off", "go away", "depend on");

// To split the string per <space>
$arr = explode(' ', $str);

// To remove empty elements
$arr = array_filter($arr);

foreach ( $arr as $k => $v) {
    if ( isset($arr[$k+1])) {
        $combined_word = $arr[$k] . ' ' . $arr[$k+1];
        if ( in_array($combined_word, $one_words) ){
            $arr[$k] = $combined_word;
            unset($arr[$k+1]);
        }
    }
}

print_r($arr);

Demo

stack
  • 10,280
  • 19
  • 65
  • 117
  • what about my answer ? don't you think it would be simpler ? – Mustofa Rizwan Mar 15 '17 at 06:01
  • 1
    It should work ++ but not sure if it will be faster / simpler. – anubhava Mar 15 '17 at 06:05
  • @RizwanM.Tuman Your answer works based on `a`. As I've mentioned in the question, there is lots of bepartite words *(i.e `depend on`)* which should be considered as one word. I mean your approach won't work https://3v4l.org/UILv1 – stack Mar 15 '17 at 06:05
  • 1
    @stack yeah that's true ... I missed the later part or may be it was edited later :) – Mustofa Rizwan Mar 15 '17 at 06:07
-2

@stack try below concept, it will give your desired output for the particular your string :

<?php
$str = "this is a test";
$arr = strpos($str, "a") < strpos($str,"test") ? explode(" ", $str, 3) : explode(" ", $str);
print_r($arr);
lazyCoder
  • 2,544
  • 3
  • 22
  • 41