0

I am editing some Interspire Email code. Currently the program goes through the HTML of the email before sending, and looks for 'a href' code, to replace the links. I want it to also go through and get form action="" and replace the urls in them (it does not currently). I think I can use the regex from this stack post:

PHP - Extract form action url from mailchimp subscribe form code using regex

but I'm having some difficulty wrapping my head around how to handle the arrays. The current code that just does the 'a href=' is below:

    preg_match_all('%<a.+(href\s*=\s*(["\']?[^>"\']+?))\s*.+>%isU', $this->body['h'], $matches);
    $links_to_replace = $matches[2];
    $link_locations = $matches[1];

    arsort($link_locations);
    reset($links_to_replace);
    reset($link_locations);

    foreach ($link_locations as $tlinkid => $url) {
        // so we know whether we need to put quotes around the replaced url or not.
        $singles = false;
        $doubles = false;

        // make sure the quotes are matched up.
        // ie there is either 2 singles or 2 doubles.
        $quote_check = substr_count($url, "'");
        if (($quote_check % 2) != 0) {
        ...

I know (or I think I know), that I need to replace preg_match_all with:

    preg_match_all(array('%<a.+(href\s*=\s*(["\']?[^>"\']+?))\s*.+>%isU', '|form action="([^"]*?)" method="post" id="formid"|i'), $this->body['h'], $matches);

but then how are the '$matches' handled?

$links_to_replace = $matches[2];
$link_locations = $matches[1];

does not still hold true does it? Is it possible to do what I'm thinking? Or would I need to write another function just to handle the 'forms action=' seperate from the 'a href'

Community
  • 1
  • 1
brizz
  • 271
  • 1
  • 6
  • 17
  • It looks like you're trying to pass multiple args in `preg_match_all()` here. Why not just use `preg_replace()` for replacing the `form action=` urls or `DOM`? If I assume correctly, you want to replace the urls from elements of an array, this is better done using `preg_replace_callback()` – hwnd Dec 22 '13 at 18:24
  • If you want to perform replacements, why don't you use preg_replace or preg_replace_callback? – Casimir et Hippolyte Dec 22 '13 at 18:26
  • I am not very proficient in PHP, so don't really know the best way (or only way) to go about things. So if I want to add a function, normally I just go off of what is already there and try to add on to it. Are you saying preg_match_all wouldn't work? I assume they had a reason for using that as opposed to preg_replace() – brizz Dec 22 '13 at 18:28
  • Well it depends, like I stated if you're wanting to find all `form action=` urls then replace those urls from array elements, then best to use `preg_replace_callback()` to perform this. If this is not the case, then please state what you exactly want to do more clearly =) – hwnd Dec 22 '13 at 18:29

1 Answers1

0

A suggestion:

$pattern = <<<'LOD'
~
(?|            # branch reset feature: allows to have the same named
               # capturing group in an alternation. ("type" here)
    <a\s           # the link case
    (?>  # atomic group: possible content before the "href" attribute
        [^h>]++        # all that is not a "h" or the end of the tag ">"
      |
        \Bh++          # all "h" not preceded by a word boundary
      |
        h(?!ref\s*+=)  # all "h" not followed by "ref=" or "ref    ="
    )*+  # repeat the atomic group zero or more times.
    (?<type> href )
  | #### OR ####
    <form\s        # the form case
    (?> # possible content before the "action" attribute. (same principle)
        [^a>]++
      |
        \Ba++
      |
        a(?!ction\s*+=)
    )*+
    (?<type> action )
)
\s*+ = \s*+     # optional spaces before and after the "=" sign
\K              # resets all on the left from match result
(?<quote> ["']?+ )
(?<url> [^\s"'>]*+ )
\g{quote}       # backreference to the "quote" named capture (", ', empty)
~xi
LOD;

Note that this pattern will only match the url with possible quotes. However, the attribute name will be stored inside the named capture group "type" if you need it.

Then you can use all of this with:

$html = preg_replace_callback($pattern,
    function ($m) {
        $url = $m['url'];
        $type = lowercase($m['type']);
        $quote = $m['quote'];
        // make what you want with the url, type and quotes
        return $quote . $url . $quote;        
    }, $html);
Casimir et Hippolyte
  • 88,009
  • 5
  • 94
  • 125