0

I've simplified this question because it was getting quite long. Basically I want to get a substring of the $subject that goes from the start of $subject up to the current match the callback function is running on. Here is an example of some input (javascript):

$subject = "var myUrl = '<a href=\"http://google.co.uk\">click me</a>';";

I'm using a url matching regex in my preg_replace_callback, so it will match http://google.co.uk. I want to get a substring of $subject up to the start of that match: var myUrl = '<a href=" should be contained in the substring. How can I do this?

$subject = "var myUrl = '<a href=\"http://google.co.uk\">click me</a>';";
preg_replace_callback("MY URL MATCHING PATTERN", function($matches) {
  // Get length of $subject up to the current match
  $length = ?; // this is the bit I can't work out
  // Get substring
  $before = substr($subject, 0, $length);
  // Work out whether or not to escape the single quotes
  $quotes = array();
  preg_match_all("/'/", $before, $quotes);
  $quotecount = count($quotes);
  $escape = ($quotecount % 2 == 0 ? "" : "\\");
  // Return the binary value
  return "javascript:parent.query(".$escape."'".textToBinary($matches[0]).$escape."')";
}, $subject);
Peter Gordon
  • 1,075
  • 1
  • 18
  • 38
  • Either use a DOM parser instead of regex – or do not modify the actual source code, but attach event handlers to the links instead at run-time, and have those call `window.open` with the content of the `href` attribute of the respective link. – CBroe Oct 12 '15 at 20:24
  • There is no single quotes on your newly first example or I don't see it. – revo Oct 13 '15 at 11:15
  • Sorry it's not clear for me. Please consider providing a live demonstration if you can: http://ideone.com – revo Oct 13 '15 at 11:23

1 Answers1

1

- Firstly, I recommend using DOM functionalities such as PHP DOMDocument or DOMXPath.

- Secondly, it is better to revise your RegEx. (\S is the culprit)

- Thirdly, a quick solution to your problem is:

return "javascript:open('".str_replace("'", "\\'", $matches[0])."')";

Updated:

$subject = "var myUrl = '<a href=\"http://google.co.uk\">click me</a>';";

$pattern = "@(https?://([-\w\.]+)+(:\d+)?(/([-\w/_\.]*(\?\S+)?)?)?)@";
$result = preg_replace_callback($pattern, function($matches) use ($subject) {
  $pos = strpos($subject, $matches[0]);
  $str = substr($subject, 0, $pos);
  $escape = (strpos($str, "'") == false) ? "'" : "\\'";
  return "javascript:parent.query({$escape}".textToBinary($matches[0])."{$escape})";
}, $subject);
revo
  • 47,783
  • 14
  • 74
  • 117
  • I thought of a better way so I updated the question. Please help! – Peter Gordon Oct 14 '15 at 13:19
  • @pgmann You're now using `preg_replace_callback` with no return value. What's the purpose? – revo Oct 14 '15 at 13:55
  • I added the return statement in. – Peter Gordon Oct 14 '15 at 15:59
  • Hmm, this seems useful: `preg_match_all` flags: `PREG_OFFSET_CAPTURE`. See the PHP Docs - http://php.net/manual/en/function.preg-match-all.php – Peter Gordon Oct 14 '15 at 16:18
  • @pgmann Yes but your logic is wrong. If I'm not misunderstood your intention, posted an update. Please check. – revo Oct 14 '15 at 18:08
  • Hmm looks good. Any way of dealing with two identical URLs? It will always find the first instance. – Peter Gordon Oct 14 '15 at 18:33
  • Or not if the URL is replaced in the callbac – Peter Gordon Oct 14 '15 at 18:41
  • I don't know why you changed the subject. It's not a good sample because your dealing with a whole page. However if it's what you want you should iterate over `$matches` array. For now code block is just working with `$matches[0]` @pgmann – revo Oct 14 '15 at 18:55
  • That seems to work. Thanks for your time! (I changed the `$escape` line to `$escape = (substr_count($str, "'") % 2 == 0) ? "'" : "\\'";`) – Peter Gordon Oct 15 '15 at 16:11