0

I am trying to parse a string of HTML tag attributes in php. There can be 3 cases:

attribute="value"  //inside the quotes there can be everything also other escaped quotes
attribute          //without the value
attribute=value    //without quotes so there are only alphanumeric characters

can someone help me to find a regex that can get in the first match the attribute name and in the second the attribute value (if it's present)?

Aamir
  • 16,329
  • 10
  • 59
  • 65
mck89
  • 18,918
  • 16
  • 89
  • 106
  • 9
    Why are you trying to do this with a regular expression? A real HTML parser is a much easier approach. – Quentin Sep 04 '09 at 15:47
  • Because i'm building my own library and i can't take the code from another one – mck89 Sep 04 '09 at 15:49
  • 2
    because i want to do it like this – mck89 Sep 04 '09 at 15:57
  • 2
    You mean you want us to tell you how to use a glass bottle for pounding your nails? http://weblogs.asp.net/alex_papadimoulis/archive/2005/05/25/408925.aspx – soulmerge Sep 04 '09 at 16:00
  • 1
    @mck89: please, don't feel attacked, it wasn't my intention. Its just that you *are* going through the this the **hard** way. Didn't attempt to offend you, just wanted to know why you couldn't use neither existing code nor an html parser. – Esteban Küber Sep 04 '09 at 16:03
  • @voyager: don't worry. Anyway i found a good solution and this is the important thing. – mck89 Sep 04 '09 at 16:06
  • possible duplicate of [Can you provide some examples of why it is hard to parse XML and HTML with a regex?](http://stackoverflow.com/questions/701166/can-you-provide-some-examples-of-why-it-is-hard-to-parse-xml-and-html-with-a-rege) – Brad Mace Jul 09 '11 at 21:01
  • possible duplicate of [RegEx match open tags except XHTML self-contained tags](http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags) – Paŭlo Ebermann Sep 15 '11 at 14:08

2 Answers2

9

Never ever use regular expressions for processing html, especially if you're writing a library and don't know what your input will look like. Take a look at simplexml, for example.

Community
  • 1
  • 1
soulmerge
  • 73,842
  • 19
  • 118
  • 155
2

Give this a try and see if it is what you want to extract from the tags.

preg_match_all('/( \\w{1,}="\\w{1,}"| \\w{1,}=\\w{1,}| \\w{1,})/i', 
    $content, 
    $result, 
    PREG_PATTERN_ORDER);
$result = $result[0];

The regex pulls each attribute, excludes the tag name, and puts the results in an array so you will be able to loop over the first and second attributes.

Peter Ajtai
  • 56,972
  • 13
  • 121
  • 140
JasonBartholme
  • 132
  • 1
  • 2
  • 9
  • I found a faster and more precise solution, but i try your regex and it seems to work so it's a good starting point and i take your answer as solution. Thank you! – mck89 Sep 04 '09 at 16:12
  • https://stackoverflow.com/a/70079532/14344959 – Harsh Patel Nov 23 '21 at 10:50