0

I extract from an article for example the publicationYear, title and authors like this:

$aut = $xpath->query("//table[@cellpadding='6']//b[1]");
$authors = array();
foreach($aut as $node)
    $authors[] = $node->nodeValue;
$title = $doc->getElementsByTagName('h3')->item(1);
$publicationYear = $xpath->query("//p[1]//text()[(following::br)]")->item(0)->nodeValue;
$aux = $xpath->query("//p[2]//text()[(preceding::br)]");
$doi = substr($aux->item($aux->length - 1)->nodeValue, 4);

For all strings(the full name, year, title) i need to get even all the tags that come before like :

form1_table3_tbody1_tr1_td1_table5_tbody1_tr1_td2_p2

and the position in the tag like start: 163,end: 190. I know only that those informations are grouped in certain tags, but i need to get even the index of the tag if it has siblings that's why the example has table 3 for the third son of forum 1. If there's a way of doing it in php or at least javascript

UPDATE In te article I have:

...
<td valign="top"> 
<h3 class="blue-space">D-Lib Magazine</h3>
<p class="blue">November/December 2014<br>
Volume 20, Number 11/12<br><a href="http://www.dlib.org/dlib/november14/brook/../11contents.html" target="_blank">Table of Contents</a>
</p>
...

and the $publicationYear from the first code get this val 2014. The first code works fine. I need to create other 3 variables like $fathers =...td1_p1, $start=18, $end=22

  • your question is a bit confusing. please give a simple example input, and corresponding example output. also explain what does not work with your current code, and what error you get, if any. – hoijui Jul 08 '15 at 11:05
  • still confusing after edit. Where should "td1_p1" come from? What are you trying to achieve exactly? What are yourparsing / transformation rules? – Kaii Jul 08 '15 at 13:13
  • On the last html example as can you see: td1 stands for the first , p1 stands for the first

    inside

    . And after i got my text I need, inside this paragraph, its coordinates(the index of start and the end);
    – Claudiu Ep Jul 08 '15 at 13:19

0 Answers0