2

I'm trying to use a regex with preg_split to separate a url from a string:

    $body = "blah blah blah http://localhost/tomato/veggie?=32";
    $regex = "(((f|ht){1}tp://)[-a-zA-Z0-9@:%_\+.~#?&//=]+)";
    $url = preg_split($regex, $body);

The resulting array is:

    array(2) (
    [0] => (string) blah blah blah 
    [1] => (string))

I would like to return:

    array(2) (
    [0] => (string) blah blah blah 
    [1] => (string) http://localhost/tomato/veggie?=32)

Not sure what I'm doing wrong here...any advice will be appreciated.

Akersh
  • 43
  • 4

3 Answers3

4

Try adding another set of brackets to capture the entire URL with an optional preg_split() parameter:

$regex = "((((f|ht){1}tp://)[-a-zA-Z0-9@:%_\+.~#?&//=]+))";
$url = preg_split($regex, $body, null, PREG_SPLIT_DELIM_CAPTURE);

Output:

array(5) {
  [0]=>
  string(15) "blah blah blah "
  [1]=>
  string(34) "http://localhost/tomato/veggie?=32"
  [2]=>
  string(7) "http://"
  [3]=>
  string(2) "ht"
  [4]=>
  string(0) ""
}
Rowlf
  • 1,752
  • 1
  • 13
  • 15
  • you could add 2 non-cature groups to clean the output up like this `(((?:(?:f|ht){1}tp://)[-a-zA-Z0-9@:%_\+.~#?&//=]+))` - takes `[2]` and `[3]` from the array. `:)` – Biotox Jan 25 '12 at 22:35
1

It's failing because you are splitting on a URL, not on a delimiter. The delimiter in this case is the "last space before ftp or http":

$body = "blah blah blah http://localhost/tomato/veggie?=32";
$regex = '/\s+(?=(f|ht)tp:\/\/)/';
$url = preg_split($regex, $body);

To break down the regular expression:

\s+ - One or more spaces
(?=...) - Positive look-ahead (match stuff in this group, but don't consume it)
(f|ht)tp:\/\/ - ftp:// or http://
Sean Bright
  • 118,630
  • 17
  • 138
  • 146
  • If a word came after the URL, like `blah blah blah http://localhost/tomato/veggie?=32 test`, it would be added into the part with the URL. `array([0]=>'blah blah blah ',[1]=>'http://localhost/tomato/veggie?=32 test')` – Biotox Jan 25 '12 at 22:38
  • Indeed. Luckily that doesn't apply in this case. – Sean Bright Jan 25 '12 at 22:43
0

The first issue is that your regex is not delimited (i.e. not surrounded by slashes).

The second issue is that given the sample output you provided, you may want to look into using preg_match instead.

Try this, see if it's what you want:

$body = "blah blah blah http://localhost/tomato/veggie?=32";
$regex = "/^(.*?)((?:(?:f|ht)tps?:\/\/).+)/i";
preg_match($regex, $body, $url);
print_r($url);
Chris Tonkinson
  • 13,823
  • 14
  • 58
  • 90