0

I am using Symfony Crawler component to parse html like this:

<div>              //first level div
    <div>1</div>   //sub div
    <div>2</div>
    <div>
      <div></div>  // more levels and empty divs possible
    </div>
</div>
<div>
    <div>3</div>
    <div>4</div>
</div>

Values 1 2 3 4 may vary, or even not exist in an empty div, but also div could contain subDivs etc. I'm stuck at phase of selecting first level divs to process them. Xpath request return me first level divs and also subdivs

$crawler = new Crawler($html);
foreach ($crawler->filterXPath('//div') as $domElement) {
    var_dump($domElement->textContent);
 }

returns

 string(2) "12"
 string(1) "1"
 string(1) "2"
 string(2) "34"
 string(1) "3"
 string(1) "4"

How should Xpath request look like to prevent processing of subElements?

UPD:
actual trouble DOM scheme

<div>              //first level div
    <div>1</div>   //sub div
    <div>2</div>
</div>
<div>
    <div>3</div>
    <div>4
        <div>5</div>
        <a>6</a>
    </div>
 </div>

This DOM tree should be processed by first level divs and depending on existence of <a> tag makes some logic.

Mr Lister
  • 45,515
  • 15
  • 108
  • 150
Tesmen
  • 559
  • 1
  • 6
  • 21
  • what exactly do you want to print? – drkthng Nov 10 '15 at 14:10
  • I need to process every firstLevel div for it contents but i still receive all DIV elements from DOM tree including subLevels. Solution like "//div[./div]" is not applicable, because some sub divs also can have DIVs in it. – Tesmen Nov 10 '15 at 14:13
  • Finally i'd like to convert this DOM to specific array for further logic manipulations. – Tesmen Nov 10 '15 at 14:14
  • then you should provide an HTML structure that models your real one as closely as possible -> I edited your question accordingly – drkthng Nov 10 '15 at 14:24
  • corrected html to actual scheme – Tesmen Nov 10 '15 at 14:33
  • updated my answer -> this will now return only first-level div-elements -> have a look if this helps your case – drkthng Nov 10 '15 at 14:38

2 Answers2

2

In your special case, if you only want the first level div elements, you can just search for any elements that do not have any div elements above them:

"//div[not(.//ancestor::div)]"

this xpath will result only in the first level div elements

Beware that this solution is only good for your example. A more difficult structure might need a different solution.

drkthng
  • 6,651
  • 7
  • 33
  • 53
0

Try like this:

$crawler = new Crawler($html);
foreach ($crawler->filterXPath('//div')->children() as $domElement) {
    var_dump($domElement->textContent);
}

EDIT:

In this specific case, you should try:

foreach ($crawler->filterXPath('//div/div') as $domElement) {
    var_dump($domElement->textContent);
}
scoolnico
  • 3,055
  • 14
  • 24
  • This will output only 1 2 without 3 4 as $crawler->filterXPath('//div')->children(), but however i guess the idea is close to solution. – Tesmen Nov 10 '15 at 13:28