0

Let's say I have a string like this

I am flying from "Detroit to Vancouver" this July

$string = 'I am flying from "Detroit to Vancouver" this July';

I also have an array of "stopwords" (words that I'm choosing to remove from the string/strings)

$stopwords = array( "to", "anotherstopword", "andanother" )

Right now I'm just using

$string = str_replace($stopwords, ' ', $string);

This of course gives me string(33) "I am flying from "Detroit Vancouver" this July"

I was thinking about maybe exploding the $string with a space before the str_replace, giving me something like

Array
(
    [0] => I
    [1] => am
    [2] => flying
    [3] => from
    [4] => "Detroit
    [5] => to
    [6] => Vancouver"
    [7] => this
    [8] => July
)

Then perhaps removing them from the array, doing the replace, and re-inserting them.. but this seems overkill

I've also thought about using a function like this

  function getStringBetween($str, $from, $to, $withFromAndTo = false)
  {
      $sub = substr($str, strpos($str, $from) + strlen($from), strlen($str));
      if ($withFromAndTo)
          return $from . substr($sub, 0, strrpos($sub, $to)) . $to;
      else
          return substr($sub, 0, strrpos($sub, $to));
  }

When doing so,

    echo '<pre>';
    print_r(getStringBetween($string, '"', '"'));
    echo '</pre>';

Outputs:

Detroit to Vancouver

And doing some type of ignore condition before the str_replace..

But this fails whenever there are multiple quotations in the string..

Ideally I would like to create a condition to where if the string contains double quotes, to ignore them entirely in the str_replace process.

I am of course not opposed to using something other than str_replace, like preg_replace, but I do not have enough experience with that to produce a sample for my expected output.

Can anyone think of a good way to ignore stop words/words to be removed before doing the replace?

EDIT:

Code Sample

<?php

  $stopwordstest = array( " to ", " a ", " test " );

  $string = 'I am flying from "Detroit to Vancouver" this July when the weather is test nice';

  var_dump($string);

// as is, without string replace
// string(79) "I am flying from "Detroit to Vancouver" this July when the weather is test nice" 

  $string = str_replace($stopwordstest, ' ', $string);

  echo '<br><br>';

  var_dump($string);

// string(71) "I am flying from "Detroit Vancouver" this July when the weather is nice"

// Expected output is:
//
// string(74) "I am flying from "Detroit to Vancouver" this July when the weather is nice"
//

?>

In other words, I'd like the string replacement to go forth as intended, but since the word to is encapsulated in quotes ("Detroit to Vancouver"), it should skip this word because it is in quotes.

bbruman
  • 667
  • 4
  • 20
  • I think you missed a step in your explanation: with your example inputs, what do you want the outputs to be? If you give 2 or 3 example inputs and their desired outputs I can probably help. – Bing Jun 03 '18 at 23:56
  • Added a code sample.. If it is not clear all I am simply trying to do is a str_replace or preg_replace function to remove an array of words that are NOT in doublequotes... if a group of words is encapsulated in quotes, I'd like it to remain unmodified or skipped in my replacement method. – bbruman Jun 04 '18 at 00:14
  • does your string always look like this ? I mean it has only one part containing "" or it may have multiple parts with ""? – Masoud Haghbin Jun 04 '18 at 00:24
  • As of now I am only encountering one parts containing double-quotes but I'd like to have it so that if there are multiple it can accomodate... check out @revo answer it seems to be a great solution – bbruman Jun 04 '18 at 00:28

2 Answers2

1

This would be easy using Regular Expressions, more easier using PHP (PCRE). With PCRE you have this ability to match and skip using (*SKIP) backtracking verb. You match a double-quoted string then make engine to skip this part from overall match and type your desired pattern in second side of alternation.

"[^"\\]*(?:\\.[^"\\]*)*"(*SKIP)(*F)

Above regex matches a double quoted string (including escaped double quotation marks) and then tells engine to forget.

This would be the PHP code that implements this feature along with gathering stop words within a regex:

echo preg_replace('/"[^"\\\\]*(?:\\\\.[^"\\\\]*)*"(*SKIP)(*F)|\b(?:'
    . implode('|', array_map('preg_quote', $stopwords))
    . ')\b\h*/', '', $string);

Live demo

revo
  • 47,783
  • 14
  • 74
  • 117
  • 1
    YES! That works brilliantly. That regex is so crazy it would have taken me days to even get a grasp around it. Need to give it a real in-depth look one of these days because the possibilities (as shown) are very effective! Thank you. – bbruman Jun 04 '18 at 00:30
0
foreach ($stopwords as &$stopword) {
    $string = str_replace($stopword, ' ', $string);
}
Masoud Haghbin
  • 863
  • 7
  • 13
  • Not sure about how this is supposed to work, but it is still removing the words in double-quotes. Using the edited sample the output is the same `string(71) "I am flying from "Detroit Vancouver" this July when the weather is nice"` as a normal `str_replace` – bbruman Jun 04 '18 at 00:17
  • yep , sorry I made a mistake – Masoud Haghbin Jun 04 '18 at 00:19
  • Thanks for trying the `&$` operator is an interesting (possible) option nonetheless – bbruman Jun 04 '18 at 00:21