Replace only the second html tag

Question

I want to replace the second h2 tag to h3 and I hope someone can help me with a replacement regex, or maybe a preg_split - I am not exactly sure.

For example, this:

<h2>My text one</h2>
<h2>My text two</h2>
text …
<h2>My text three</h2>

Should become this:

<h2>My text one</h2>
<h3>My text two</h3>
text …
<h2>My text three</h2>

Why are you replacing html elements with PHP I think is the question here. Are you sure you do not mean Javascript? — thatidiotguy, Nov 19 '12 at 18:25
You need to give us an idea of the context. Where does this need to happen - client or server side? If server-side, where's the code that prints the header tags? — mowwwalker, Nov 19 '12 at 18:49
Don't use regular expressions to parse PHP. http://htmlparsing.com/regexes.html explains why. Use a proper DOM parser. http://htmlparsing.com/php.html gives some examples. — Andy Lester, Nov 19 '12 at 22:32

Digitalis · Accepted Answer · 2012-11-19T18:44:34.393

I agree with the other commentaries, this should be done through a dom parser. But here is a working php solution nevertheless.

<?php 
     // Fill $str with the html;

     preg_replace("/[h]{1}[2]/i", "h3", $str);
?>

or

<?php
     // Fill $str with the html;

     str_replace("h2", "h3", $str);      
?>

This should work just fine. Adding the $matches parameter to preg_replace will also keep track of the number of changes made. Now, using a loop you may control which element needs to be replaced, however, the functions as written above will detect all occurences of h2.

Also, I've overly complicated the regexp for you to be able to swap out the number, to make a more useful function with it. Just using "/(h2)/i" will do the trick too.

So, your code should implement a loop in the correct manner to prevent replacing all of the tags and you should decide if the function is going to handle just h2 or if it should be more flexible.

As a final remark, str_replace is faster than preg_replace, so if this is the only edit you need to make, I would recommend str_replace.

BernaMariano · Answer 2 · 2012-11-19T18:50:19.030

2

You can easily do that with Javascript, does it really need to be with PHP?

Get the second <h2> value

$text = $("h2:eq(1)").html();

Destroy it.

$("h2:eq(1)").remove();

Create an <h3> after the first <h2>, with $text in it

$("h2:eq(0)").after("<h3>" + $text + "</h3>");

edited Nov 19 '12 at 18:50

answered Nov 19 '12 at 18:42

BernaMariano

846
2
9
27

1

OP didn't say it could not be javascript. – BernaMariano Nov 19 '12 at 18:43
1

There is a huge distinction between doing something server-side and doing something client-side. He tagged php, asked about php, and never mentioned javascript. Though, he is being very vague about what it is he's actually trying to do and what he's using, so, on second though, it's possible he means javascript. – mowwwalker Nov 19 '12 at 18:47

Pebbl · Answer 3 · 2012-11-19T21:30:51.557

You do not need to use a server-side HTML Parser for this, that would be completely overkill imo. The following is an example that obviously could be broken by certain HTML constructs, but for most mark-up it would not have a problem what-so-ever - and will be much more optimal than a server-side HTML parser.

$html = '
<h2>My text one</h2>
<h2>My text two</h2>
text ...
<h2>My text three</h2>
';

preg_match_all

/// the following preg match will find all <h2> mark-up, even if 
/// the content of the h2 splits over new lines - due to the `s` switch
/// It is a non-greedy match too - thanks to the `.+?` so it shouldn't 
/// have problems with spanning over more than one h2 tag. It will only
/// really break down if you have a h2 as a descendant of a h2 - which
/// would be illegal html - or if you have a `>` in one of your h2's
/// attributes i.e. <h2 title="this>will break">Text</h2> which again
/// is illegal as they should be encoded.

preg_match_all(
  '#(<)h2([^>]*>.+?</)h2(>)#is',
  $html,
  $matches,
  PREG_OFFSET_CAPTURE|PREG_SET_ORDER
);

replace and rebuild

/// Because you wanted to only replace the 2nd item use the following. 
/// You could however make this code as general or as specific as you wanted.
/// The following works because the surrounding content for the found 
/// $matches was stored using the grouping brackets in the regular 
/// expression. This means you could easily change the regexp, and the 
/// following code would still work.

/// to get a better understanding of what is going on it would be best
/// to `echo '<xmp>';print_r( $matches );echo '/<xmp>';`

if ( isset($matches[1][0]) ) {
  $html = substr( $html, 0, $matches[1][0][1] ) . 
          $matches[1][1][0] . 'h3' . 
          $matches[1][2][0] . 'h3' . 
          $matches[1][3][0] .
          substr( $html, $matches[1][0][1] + strlen($matches[1][0][0]) );
}

I have no idea why many are stating to use client-side JavaScript for this change, PHP stands for PHP: Hypertext Preprocessor it is designed to preprocess hypertext. The OP has only ever mentioned PHP functions and tagged this post with PHP so there is nothing leading towards client-side.

True whilst client-side can and should be used where possible to alleviate processing from the server-side, this should not be recommended for core structural tags like headings - which will be relied upon by screen readers and search engine bots. At best client-side JavaScript should be used to enhance a user's experience. If you use it to critically augment your site's capabilties you had better be sure your enitre userbase supports it.

Now however, if any of you had mentioned Node.js and JSDOM I would have quite happily agreed.

That remains to be seen. The browser does not need to do a lot of extra work for it. It already has the dom-tree. Point being, the request is to change html that is hardcoded. In my opinion, that should be left to the client side as much as possible. — Digitalis, Nov 19 '12 at 19:34
@Digitalis Sorry I wasn't including client-side in my answer as it wasn't requested by the OP. With regard to changing Heading tags this would not be a good idea to do from JS with regards to SEO. — Pebbl, Nov 19 '12 at 20:36
Why does altering it on the client side negatively impact SEO? Also, no harm done. The example you provided also does what it needs to do and I'm always up for a discussion, as long as its productive. — Digitalis, Nov 19 '12 at 22:54
@Digitalis sure thing, no worries :) It affects SEO/screen readers because many bots/readers will not execute JavaScript - and so in this example the headers will still remain H2s rather than giving the document outline it's correct hierarchical structure i.e. H1, H2, H3. Whilst yes the hit will possibly be minimal for SEO, this will affect screen readers quite heavily... *(all headers will be treated and read out with the same importance - rather confusing and irritating if you are unable to see)* hence the reason why it is better for core mark-up structure to be handled by the server-side. — Pebbl, Nov 19 '12 at 23:54
Plus one, good point. Didn't think of that. To be honest, I prefer switching classes server-side, but that is a very good point you make here. This might go unnoticed for quite some time, but in fact will affect your SEO. Thanks mate! — Digitalis, Nov 20 '12 at 00:05
@Digitalis Cheers oh and no problem, it's not something most would immediately think about -- I've just done too much work in the advertising sector for me to forget ;) — Pebbl, Nov 20 '12 at 00:17

Replace only the second html tag

3 Answers3