1

ok so i have been battling with this for a while now so maybe someone can help me.

Im trying to get the email link from this HTML:

<div id="field_11" class="fieldRow span12 lastFieldRow">
  <span class="caption">E-mail</span>
  <span class="output">
   <script type="text/javascript">
    <!--
     document.write('<a hr'+'ef="mai'+'lto'+':'+
      '%40;%67;%6d;%61;%69;%6c;<\/a>');
    //-->
   </script>
   <a href="mailto:%40%67%6d%61%69%6c">@mail</a>
  </span>
</div>

Im trying to get the '@mail' part of the html code, after the a href="mailto:..." part. NOT the document.write() part but the last tag in the code.

for some reason when ever i try to get the children of the tag span with the output class it thinks it only has 1 child which is the script tag but i just can't seem to grab the email plain text.

So far what i have:

 $target_url = "some_web_site";
 $html = new simple_html_dom();
 $html->load_file($target_url);

foreach($html->find('span[class=output]') as $d){ 
    echo $d->children(1)->plaintext . "<br />";
 }

any help?

  • Your code should work, what's the output of it (or the error message) ? –  Apr 29 '14 at 17:46
  • it prints out a bunch of these errors: Notice: Trying to get property of non-object in /Applications/MAMP/htdocs/webcrawler/index.php on line 224 – user3586322 Apr 29 '14 at 17:47
  • Sounds like your `load_file()` isn't loading right. Can you try removing the 2nd and 3rd lines (both beginning with `$html`, and replace with `$html = file_get_html($target_url);`? – Sunny Patel Apr 29 '14 at 18:12
  • @LaughDonor - tried your approach, still got those errors. – user3586322 Apr 29 '14 at 18:44
  • Well, the main reason you're having this problem is `$html->find('span[class=output]')` is returning `null`. You need to check to make sure your selectors are correct. Maybe using `span.output` instead? – Sunny Patel Apr 29 '14 at 18:50
  • gives me the same error with 'span.output'. could this be the website preventing me or something else? – user3586322 Apr 29 '14 at 19:28

1 Answers1

1

It is possible with just DOM+Xpath, too.

$dom = new DOMDocument();
$dom->loadHtml($html);
//$dom->loadHtmlFile($htmlFile);
$xpath = new DOMXpath($dom);

var_dump(
  $xpath->evaluate(
    'string(//span[@class="output"]//a[starts-with(@href, "mailto:")])'
  )
);

Output: https://eval.in/148063

string(5) "@mail"

The Xpath selects all span elements with the class attribute "output"

//span[@class="output"]

Then it looks for a elements where the href attribute starts with "mailto:"

//span[@class="output"]//a[starts-with(@href, "mailto:")]

The result of this is a list of a element nodes (with the example content a single node). The string() function casts the first node into a string if the node list is empty it will return an empty string.

string(//span[@class="output"]//a[starts-with(@href, "mailto:")])

ThW
  • 19,120
  • 3
  • 22
  • 44