0

I have to collect some data from a website.My data is wrapped as div s.Inside each div there is a title tag.I need to get the text inside these title tags.How to do this. I have written the following code.What modification I have to apply for acheiving the task

<?php
$str = '';
$page =  file_get_contents('http://www.sarkari-naukri.in/');
$dom = new DOMDocument();
$dom->loadHTML($page);
$divs = $dom->getElementsByTagName('div');
$i = 0;
$len = $divs->length;
while($i<$len) {
    $div = $divs->item($i++);
    $id = $div->getAttribute('id');
    if(strpos($id,'post-') !== false ) {
           // i need to get text inside title tag inside this div
        $title ='';//title should be stored here
        $str = $str.$title;
    }
}
echo $str;

SAMPLE HTML

<body>
    <div id = 'post-1'>
         <title>title 1</title>
    </div>
    <div id = 'post-2'>
         <title>title 2</title>
    </div>
    <div id = 'post-3'>
         <title>title 3</title>
    </div>
</body>
Jinu Joseph Daniel
  • 5,864
  • 15
  • 60
  • 90

2 Answers2

2

The following PHP DOMDOcument code:

$id = $div->getAttribute('id');
if (strpos($id,'post-') !== false) {

can be expressed in Xpath 1.0 with a Xpath string functionDocs:

//div[contains(@id, 'post-')]

Reading: Any div element which has an id attribute containing the string post-. By the rules of Xpath you can further extend the expression like selectinig the title children of all those:

//div[contains(@id, 'post-')]/title
hakre
  • 193,403
  • 52
  • 435
  • 836
1

You can use a xpath query to retrieve title information:

$xml = "<body>
    <div id = 'post-1'>
         <title>title 1</title>
    </div>
    <div id = 'post-2'>
         <title>title 2</title>
    </div>
    <div id = 'post-3'>
         <title>title 3</title>
    </div>
</body>";

$str = '';

$doc = new DOMDocument;
$doc->loadHTML($xml);

$xpath = new DOMXPath($doc);

$entries = $xpath->query('//body/div/title');
foreach ($entries as $entry) {
    $str .= $entry->nodeValue;
}

var_dump($str);

Live demo.

j0k
  • 22,600
  • 28
  • 79
  • 90
  • Thanks for the awesome answer...I need to select divs with someAttribute = someValue...How to do that? – Jinu Joseph Daniel Feb 09 '13 at 10:09
  • 2
    @JinuJD: As well with xpath, please use the search. E.g. see [XPath: How to select node with some attribute by index?](http://stackoverflow.com/questions/5818681/xpath-how-to-select-node-with-some-attribute-by-index) - you should get comfortable with it after some time. – hakre Feb 09 '13 at 10:14