0

I am currently trying to create a regex that strips unecessary quotation marks from HTML tags. The regex will be used in PHP code.

<input type="image" src="/flags/en.png" alt="English" title="English" name="en" class="screen selected" />

converts to

<input type=image src="/flags/en.png" alt=English title=English name=en class="screen selected" />

I have come up with this regex and replacement:

/(?<=<(?:[^>]+?\s)?)([\w-]+=)"([\w-]+)"(?=(?:\s[^>]+)?>)/g
$1$2

The problem is that the positive lookbehind does not allow quantifiers (See http://regex101.com/ as a reference.).

So I thought I modify the pattern a little bit like this:

/(<(?:[^>]+?\s)?)([\w-]+=)"([\w-]+)"((?:\s[^>]+)?>)/g
$1$2$3$4

Now it's valid but it only strips one set of quotes from each tag.

How do I acomplish this?

BrainStone
  • 3,028
  • 6
  • 32
  • 59
  • 1
    What is the necessity? – Ben Potter Aug 13 '14 at 21:18
  • Smaller HTML. I do this to speed up my webpage. It is not 100% necessary but I want to do it since I already started it! – BrainStone Aug 13 '14 at 21:20
  • this is insane microopertimisation (no idea how to spell that) –  Aug 13 '14 at 21:25
  • Call me insane or OCD but considering that this could easily save several KiB of code I think it is worth my time! *(My current opimizations saved me something around 10%. I did this optimization by hand and it saved an additional 2%. That makes indeed a differnece.)* – BrainStone Aug 13 '14 at 21:28
  • ok well lets do this properly then: rename all the files to a single letter "/flags/en.png" to just "f" then mod rewrite rules; replace ever line break\tab; you don't need those optional elements in the input such as title at all, so strip them ... –  Aug 13 '14 at 21:32
  • I'm trying to **losslessly** optimize the HTML site without changing it's content. Striping elemets like the title attribute would change the page itself *(how the user sees it)* and have an impact on the usability. – BrainStone Aug 13 '14 at 21:59

2 Answers2

1

Try the following:

$pattern = '/(<(?:[^>]+?\s)?)([\w-]+=)"([\w-]+)"((?:\s[^>]+)?>)/';
$replacement = '$1$2$3$4';
$subject = '<input type="image" src="/flags/en.png" alt="English" title="English" name="en" class="screen selected" />';

while(preg_match($pattern, $subject)){
    $subject = preg_replace($pattern, $replacement, $subject);
}
var_dump($subject);
paolo
  • 2,528
  • 3
  • 17
  • 25
0

Probably won't save much, but here you go :)

 $string = '<input type="image" src="/flags/en.png" alt="English" title="English" name="en" class="screen selected" />'; 
 echo preg_replace('/="([a-z]+)"/i', '=$1', $string); 

Output:

<input type=image src="/flags/en.png" alt=English title=English name=en class="screen selected" />
Erlesand
  • 1,525
  • 12
  • 16