2

I currently have this string:

"<p><iframe allowfullscreen="" class="media-element file-default" data-fid="2219" data-media-element="1" frameborder="0" height="360" src="https://www.youtube.com/embed/sNEJOm4hSaw?feature=oembed" width="640"></iframe></p>"

I'd like to remove the whole iframe element (<iframe>...</iframe>) and replace it with an <a> link to the url in the src attribute:

<p><a href="https://www.youtube.com/embed/sNEJOm4hSaw?feature=oembed">Link to youtube</a></p>

Currently, I have this regex:

$res = preg_replace('/src="(.+?)"/', '/<a href="$1">Link to youtube</a>/', $str);

With this regex, I'm able to replace the src attribute with an a element. However, I'd like to replace the whole iframe element.

What is the easiest way to achieve this?

Kaspar Lee
  • 5,446
  • 4
  • 31
  • 54
Bv202
  • 3,924
  • 13
  • 46
  • 80

3 Answers3

8

Use this RegEx:

<iframe\s+.*?\s+src=(".*?").*?<\/iframe>

And this Replace:

<a href=$1>Link to youtube</a>

Which gives you the following preg_replace():

$res = preg_replace('/<iframe\s+.*?\s+src=(".*?").*?<\/iframe>/', '/<a href=$1>Link to youtube</a>/', $str);

Live Demo on Regex101


The RegEx captures all the data before and after the src, and then is therefore also replaced.

How it works:

<iframe          # Opening <iframe
\s+              # Whitespace
.*?              # Optional Data (Lazy so as not to capture the src)
\s+              # Whitespace
src=             # src Attribute
    (".*?")          # src Data (i.e. "https://www.example.org")
.*?              # Optional Data (Lazy so as not to capture the closing </iframe>)
<\/iframe>       # Closing </iframe>

Thank to @AlexBor for informing me that the following is slightly more efficient. I would suggest using this RegEx instead:

<iframe\s+.*?\s+src=("[^"]+").*?<\/iframe>

Replaced src=(".*?") (lazy) with src=("[^"]+") (greedy)

Kaspar Lee
  • 5,446
  • 4
  • 31
  • 54
  • Minor addition: replace `src=(".*?")` with `src=("[^"]+")` to use pcre engine optimizations. – Alexander Borisov Apr 05 '16 at 09:38
  • @AlexBor Is that more efficient? I have always wondered about this, using a lazy `.*?` or your method. – Kaspar Lee Apr 05 '16 at 09:40
  • 1
    The effectiveness depends on string length in src attribute. If you use lazy quantifier `".*?"` engine will do n(string length) steps. If you use greedy quantifier `"[^"]+"` engine will do constant steps (1) if in string after engine cursor position there is a quotation mark character. In a really small strings there are no benefits regardless of the approach you use. But if you will use this regex with long text it will works faster. How to work regex engine you can see in regex debugger: lazy version: https://regex101.com/r/gY5iI2/1, greedy version: https://regex101.com/r/gY5iI2/2. – Alexander Borisov Apr 05 '16 at 10:09
  • @AlexBor Thanks, nice explanation. I'll update the answer, and you get a user profile link! `;)` – Kaspar Lee Apr 05 '16 at 10:21
  • The RegEx fails if the `src`-attribute comes first, since it's checking for "whitespace-optional data-whitespace", while there will be no two whitespaces before the first attribute - this can be fixed by setting one of the two whitespaces to optional. Additionally, if the attributes are chopped down (e.g. ``, you need to add the "single line"-flag. The resulting RegEx would be `/ – CGundlach Nov 07 '19 at 09:16
1

Using a DOM parser like DOMDocument is not going to let you down. Unlike regex, it is HTML "aware". I'll add some flags to my loadHTML() call to clear out some additional html tag generation, iterate all occurrences of <iframe> tags, create a new <a> element for each occurrence, fill it with the desired values, then replace the <iframe> tag with the new <a> tag.

Code: (Demo)

$html = <<<HTML
<p><iframe allowfullscreen="" class="media-element file-default" data-fid="2219" data-media-element="1" frameborder="0" height="360" src="https://www.youtube.com/embed/sNEJOm4hSaw?feature=oembed" width="640"></iframe></p>
HTML;

$dom = new DOMDocument;
$dom->loadHTML($html, LIBXML_HTML_NOIMPLIED | LIBXML_HTML_NODEFDTD);
foreach ($dom->getElementsByTagName('iframe') as $iframe) {
    $a = $dom->createElement('a');
    $a->setAttribute('href', $iframe->getAttribute('src'));
    $a->nodeValue = "Link to youtube";
    $iframe->parentNode->replaceChild($a, $iframe);
}
echo $dom->saveHTML();

Output:

<p><a href="https://www.youtube.com/embed/sNEJOm4hSaw?feature=oembed">Link to youtube</a></p>
mickmackusa
  • 43,625
  • 12
  • 83
  • 136
0

The easiest way would be to take out the src attribute with preg_match() and then use it to create a element.

Example:

$string = "<p><iframe allowfullscreen=\"\" class=\"media-element file-default\" data-fid=\"2219\" data-media-element=\"1\" frameborder=\"0\" height=\"360\" src=\"https://www.youtube.com/embed/sNEJOm4hSaw?feature=oembed\" width=\"640\"></iframe></p>\n";

if( preg_match( '#src=\\"([^ ]*)\\"#', $string, $matches ) === 1 ){
    $string = '<a href="' . $matches[ 1 ] . '">Link to youtube</a>';
    echo $string;
}

// outputs <a href="https://www.youtube.com/embed/sNEJOm4hSaw?feature=oembed">Link to youtube</a>
nass
  • 382
  • 1
  • 6
  • 19
  • This approach doesn't replace iframe tag as wants author. – Alexander Borisov Apr 05 '16 at 09:33
  • 1
    This approach does replace `iframe` tag with `a` tag while using different functions. Author didn't specify that he **needs** to use `preg_replace()`. This approach is easier to understand, write and read and result is the same. – nass Apr 05 '16 at 09:38