Sorry for such a long post. I request to read till the end to understand what I am trying to accomplish and what my roadblock is!
I have a table like this
<html>
<body>
<table class="searchTable" cellspacing="0" cellpadding="5" style="width: 100%;">
<tbody>
<tr>
<th>Book</th>
<th>Model Name</th>
<th>Description</th>
<th>Category</th>
</tr>
<tr>
<td>
<a onclick="getFullData( '', 'K0072', 'B20' );" href="">K0072</a>
</td>
<td>B20</td>
<td>K0072 Description</td>
<td>K0072 Category</td>
</tr>
<tr>
<td>
<a onclick="getFullData( '', 'K0074', 'B2004' );" href="">K0072</a>
</td>
<td>B2004</td>
<td>K0074 Description</td>
<td>K0074 Category</td>
</tr>
<tr>
<td>
<a onclick="getFullData( '', 'K0081', 'B2005' );" href="">K0072</a>
</td>
<td>B2005</td>
<td>K0081 Description</td>
<td>K0081 Category</td>
</tr>
</tbody>
</table>
</body>
</html>
Please note, I am fetching the data from another website using cURL POST
method. Which means, I have no control over the HTML.
I am able to generate the following array from the above HTML using DOMDocument
.
array (size=3)
0 =>
array (size=4)
0 => string 'K0072' (length=5)
1 => string 'B20' (length=3)
2 => string 'K0072 Description' (length=17)
3 => string 'K0072 Category' (length=14)
1 =>
array (size=4)
0 => string 'K0074' (length=5)
1 => string 'B2004' (length=5)
2 => string 'K0074 Description' (length=17)
3 => string 'K0074 Category' (length=14)
2 =>
array (size=4)
0 => string 'K0081' (length=5)
1 => string 'B2005' (length=5)
2 => string 'K0081 Description' (length=17)
3 => string 'K0081 Category' (length=14)
This is my code:
$doc = new DOMDocument();
$doc->loadHTML( getHtml() );
$doc->preserveWhiteSpace = false;
$doc->encoding = 'UTF-8';
$tables = $doc->getElementsByTagName( 'table' );
$dataArray = array();
$count = 0;
foreach( $tables as $table ) {
if ( $table->getAttribute( 'class' ) !== 'searchTable' ) {
continue;
}
// $header = $doc->getElementsByTagName( 'th' );
// $rows = $doc->getElementsByTagName( 'tr' );
$data = $doc->getElementsByTagName( 'td' );
$tempArray = array();
foreach ( $data as $td ) {
$value = trim( $td->textContent );
$tempArray[$count] = $value;
$count++;
if( 0 !== $count && $count % 4 === 0 ) {
array_push( $dataArray, $tempArray );
$tempArray = array();
$count = 0;
}
}
}
var_dump( $dataArray );
die();
The problem is I am not able to extract the argument values of getFullData
method for each record. Because, I need to build URLs based on the arguments, for example: https://pi.php?part=K0072&m=B20
.
<tr>
<td>
<a onclick="getFullData( '', 'K0072', 'B20' );" href="">K0072</a>
</td>
...
</tr>
I had seen somehere (can't remember now! :( ) that DOXPath
could be used to find DOM elements by using element attribute, like here I probably may use the onlick
attribute.
But the problem is, the source document has other anchors as well which are calling differnt methods. This means there would be unnecessary records in PHP for filtering.
Is there a way that allows me to extract only those anchors which are calling the getFullData
method? Also how would I extract the argument values?
End of day, the final array has to look like this:
array (size=3)
0 =>
array (size=5)
0 => string 'K0072' (length=5)
1 => string 'B20' (length=3)
2 => string 'K0072 Description' (length=17)
3 => string 'K0072 Category' (length=14)
4 => string 'https://pi.php?part=K0072&m=B20'
1 =>
array (size=5)
0 => string 'K0074' (length=5)
1 => string 'B2004' (length=5)
2 => string 'K0074 Description' (length=17)
3 => string 'K0074 Category' (length=14)
4 => string 'https://pi.php?part=K0074&m=B2004'
2 =>
array (size=5)
0 => string 'K0081' (length=5)
1 => string 'B2005' (length=5)
2 => string 'K0081 Description' (length=17)
3 => string 'K0081 Category' (length=14)
4 => string 'https://pi.php?part=K0081&m=B2005'
Any suggestion?
UPDATE:
Thank you Chris Hass for driving me to some direction. Taking the idea, I just tried this and got some potential results!
$ancs = $xPath->query( "//a[@onclick]" );
foreach( $ancs as $a ) {
var_dump ( $a->getAttribute( 'onclick' ) );
}