2

i want to compare Html documents weather there are same tags with same arrangement regardless of different inner text and attribute values to be different. I just want to compare general tag structure. such as

<html>
<head>
</head> 
<body>
<span class="my paragraph">comparison of general tag structure of html</span>
</body>
</html>

and

<html>
<head>
</head> 
<body>
<span class="Mega Offer">free membership offer</span>
</body>
</html>

are same

but

<html>
<head><title>Different</title>
</head> 
<body>
<span class="my paragraph">comparison of general tag structure of html</span>
</body>
</html>

is not same because there is one extra title tag in html structure of tags regardless of inner values and attribute values are same.

GPU..
  • 175
  • 12

2 Answers2

0

If you are willing to use php there are several functions like preg_match that will look for patterns. You could use file to read the html file into an array, each new line being another entry in the array. Then do the same for the other html file. Then you can go and search for the 1st tag(aka: something that starts with <) and read the rest of the line until >. Then go and search the other html file for the same tag, counting how many times that tag appears. Rinse and repeat.

Noah Huppert
  • 4,028
  • 6
  • 36
  • 58
0

I would go in 2 stages:

Stage 1 (check if equal):
Remove everything between the tags and the attributes and than compare the result as (case insensitive) strings.

If they differ, also so this:

Stage 2 (Find the difference):
This stage highly depends on what you want to report as difference, so I cannot give specific advise how to implement it.

MrSmith42
  • 9,961
  • 6
  • 38
  • 49