3

This problem came to me today. I'm working in a web-based (Struts 2) project with lots of JSPs, and most of input, select, table and a elements are defined with only with the name attribute, no id set, such as:

<input name="myname" class="myclass" value="" type="text"/>

So far so good, except that unfortunately there's a lot of javascript validation for those fields, and as far as I could read the code before leaving most of them actually refer to the elements with document.getElementById.

The catch here is that this is an old application (not that much old actually) only compatible with IE-6 and IE-7 (I didn't searched the net in order to understand how IE actually seems to find the element only with the name attribute, but I guess it must do something). No surprise every other single browser complains and cries.

So, I'm trying to come up with a simple solution: look up all JSPs that defines input, select, table and a elements with the name attribute but not id in order to fix the HTML.

Using my good ol' friend http://rubular.com I came up with the following:

/<(?:(input|select|a|table))\s+((?!id).)*>

This will catch every referred element without id. But how can I actually assert that only those with name are matched?

Oh, another important point. The definition of the elements are in a single line, so it's most probable that there arent't such things as:

<input name="..."
       class="..."/>
resilva87
  • 3,325
  • 5
  • 32
  • 43
  • `<(?:(input|select|a|table))\s+(?!.*id=)(?=.*name=).*>` This one makes quite a lot of assumption (`=` is not used in the id/name e.g. name="weird=") and the input tag is not followed by other tag on the same line. – nhahtdh Nov 08 '12 at 02:19

4 Answers4

6

Try this:

<(?:input|select|a|table)\s+(?=[^>]*\bname\s*=)(?![^>]*\bid\s*=)[^>]*>

Explanation:

<                           "<"
(?:input|select|a|table)    One of "input", "select", "a", "table"
\s+                         Whitespace
(?=                         Positive lookahead
[^>]*                           Anything up to but excluding ">"
\b                              Word boundary
name                            "name"
\s*                             Possible whitespace
=                               "="
)
(?!                         Negative lookahead
[^>]*                           Anything up to but excluding ">"
\b                              Word boundary
id                              "id"
\s*                             Possible whitespace
=                               "="
)
[^>]*                       Anything up to but excluding ">"
>                           ">"
MRAB
  • 20,356
  • 6
  • 40
  • 33
2

Disclaimer:

Everyone will tell you NOT to use regex to parse HTML and they are right. That said, the following regex solution should do a pretty decent job for a one-off task (if 100% reliability in not a concern).

Regex to match a tag having one attribute but not another:

The following tested PHP script uses a (fully commented) regex to match the start tags of INPUT, SELECT, TABLE and A elements which have a NAME attribute but no ID attribute. The script inserts a new ID attribute into each start tag that is the same as the existing NAME attribute:

<?php // test.php Rev:20121107_2100
$re = '%
    # Match HTML 4.01 element start tags with NAME but no ID attrib.
    (                      # $1: Everything up to tag close delimiter.
      <                    # Start tag open delimiter.
      (?:input|select|table|a)\b  # Element name.
      (?:                  # Zero or more attributes before NAME.
        \s+                # Attributes are separated by whitespace.
        (?!name\b|id\b)    # Only non-NAME, non-ID before NAME attrib.
        [A-Za-z][\w\-:.]*  # Attribute name is required.
        (?:                # Attribute value is optional.
          \s*=\s*          # Name and value separated by =
          (?:              # Group for value alternatives.
            "[^"]*"        # Either a double-quoted string,
          | \'[^\']*\'     # or a single-quoted string,
          | [\w\-:.]+      # or a non-quoted string.
          )                # End group of value alternatives.
        )?                 # Attribute value is optional.
      )*                   # Zero or more attributes before NAME.
      \s+                  # NAME attribute is separated by whitespace.
      name                 # NAME attribute name is required.
      \s*=\s*              # Name and value separated by =
      (                    # $2: NAME value.
        "[^"]*"            # Either a double-quoted string,
      | \'[^\']*\'         # or a single-quoted string,
      | [\w\-:.]+          # or a non-quoted string.
      )                    # $2: NAME value.
      (?:                  # Zero or more attributes after NAME.
        \s+                # Attributes are separated by whitespace.
        (?!id\b)           # Only non-ID attribs after NAME attrib.
        [A-Za-z][\w\-:.]*  # Attribute name is required.
        (?:                # Attribute value is optional.
          \s*=\s*          # Name and value separated by =
          (?:              # Group for value alternatives.
            "[^"]*"        # Either a double-quoted string,
          | \'[^\']*\'     # or a single-quoted string,
          | [\w\-:.]+      # or a non-quoted string.
          )                # End group of value alternatives.
        )?                 # Attribute value is optional.
      )*                   # Zero or more attributes after NAME.
    )                      # $1: Everything up to close delimiter.
    # Insert missing ID attribute here...
    (\s*/?>)               # $3: Start tag close delimiter.
    %ix';
$html = file_get_contents('testdata.html');
$html = preg_replace($re, "$1 id=$2$3", $html);
file_put_contents('testdata_out.html', $html);
?>
ridgerunner
  • 33,777
  • 5
  • 57
  • 69
0

If we use getElementById in javascript then it works in Internet explorer if the element with the same name as used in id exists but it does not work in all other browsers(Firefox, Chrome, Safari etc.).

It can be eliminated by the following code.

function includeIdIfNotExist(element) {
    var id = element.getAttribute('id');
    var name = element.getAttribute('name');
    if (name && !id) {
        element.id = name;
    }
}
function addMissingId() {
    var elementsToAddId = ['input', 'select'];
    for (var j = 0; j < elementsToAddId.length; j++) {
        var inputElements = document.getElementsByTagName(elementsToAddId[j]);
        for (var i = 0; i < inputElements.length; i++) {
            includeIdIfNotExist(inputElements[i]);
        }
    }
}
document.onload = addMissingId();
Amit Garg
  • 3,867
  • 1
  • 27
  • 37
0

If you want to search for elements with name but not id, and set the id equals to the name, you can do a find and replace as follows:

Find:

(<(?:input|select|table|form|textarea)\s+)(?=[^>]*\bname\s*="(\w+)")((?![^>]*\bid\s*=)[^>]*>)

Its made to work with input, select, table, form, textarea. You can add or delete html tags from input|select|table|form|textarea part. It will also check if an element have id

Replace with:

$1id="$2" $3

This will add id="[nameValue]" on your html tag selected before which have a name but not an id.

Hope it helps!

JoelBonetR
  • 1,551
  • 1
  • 15
  • 21