Maybe you could use Generic XML parser class (also on github).
According to the author's description:
- Parses arbitrary XML input and builds an array with the structure of all tag and data elements.
- It can validate and extract data from a whole XML document with just a single call. It supports validationg common tag value data types and can perform custom validations using a subclass.
- Optionally, keeps track of the positions of each element to allow the determination of the exact location of elements that may be contextually in error.
- Supports parsed file cache to minimize the overhead of parsing the same file repeatdly.
- Optimized parsing of simplified XML (SML) formats ignoring the tag attributes.
- Validate and extract data from a whole XML document with single function call
I've tested it with this code:
<?php
require('xml_parser.php');
$file_name = 'test.xml';
$error = XMLParseFile($parser, $file_name, 1, $file_name.'.cache');
foreach ($parser->structure as $key => $val) {
if (is_array($val) && isset($val['Tag']) && !strcasecmp($val['Tag'], 'p')) {
print_r($parser->positions[$key]);
}
}
?>
The test.xml
file contains your sample HTML snippet.
By running the script from the command line I get this output:
Array
(
[Line] => 2
[Column] => 7
[Byte] => 12
)
Array
(
[Line] => 3
[Column] => 7
[Byte] => 80
)
So, the Byte
field is probably what you're looking for.
For a better understanding of how it works, have also a look at its source code.