Get first level dom elements by Symfony Crawler

Question

I am using Symfony Crawler component to parse html like this:

<div>              //first level div
    <div>1</div>   //sub div
    <div>2</div>
    <div>
      <div></div>  // more levels and empty divs possible
    </div>
</div>
<div>
    <div>3</div>
    <div>4</div>
</div>

Values 1 2 3 4 may vary, or even not exist in an empty div, but also div could contain subDivs etc. I'm stuck at phase of selecting first level divs to process them. Xpath request return me first level divs and also subdivs

$crawler = new Crawler($html);
foreach ($crawler->filterXPath('//div') as $domElement) {
    var_dump($domElement->textContent);
 }

returns

 string(2) "12"
 string(1) "1"
 string(1) "2"
 string(2) "34"
 string(1) "3"
 string(1) "4"

How should Xpath request look like to prevent processing of subElements?

UPD:
actual trouble DOM scheme

<div>              //first level div
    <div>1</div>   //sub div
    <div>2</div>
</div>
<div>
    <div>3</div>
    <div>4
        <div>5</div>
        <a>6</a>
    </div>
 </div>

This DOM tree should be processed by first level divs and depending on existence of <a> tag makes some logic.

I need to process every firstLevel div for it contents but i still receive all DIV elements from DOM tree including subLevels. Solution like "//div[./div]" is not applicable, because some sub divs also can have DIVs in it. — Tesmen, Nov 10 '15 at 14:13
Finally i'd like to convert this DOM to specific array for further logic manipulations. — Tesmen, Nov 10 '15 at 14:14
then you should provide an HTML structure that models your real one as closely as possible -> I edited your question accordingly — drkthng, Nov 10 '15 at 14:24
updated my answer -> this will now return only first-level div-elements -> have a look if this helps your case — drkthng, Nov 10 '15 at 14:38

drkthng · Answer 1 · 2015-11-10T14:38:09.597

2

In your special case, if you only want the first level div elements, you can just search for any elements that do not have any div elements above them:

"//div[not(.//ancestor::div)]"

this xpath will result only in the first level div elements

Beware that this solution is only good for your example. A more difficult structure might need a different solution.

edited Nov 10 '15 at 14:38

answered Nov 10 '15 at 13:16

drkthng

6,651
7
33
53

scoolnico · Answer 2 · 2015-11-10T13:34:05.663

0

Try like this:

$crawler = new Crawler($html);
foreach ($crawler->filterXPath('//div')->children() as $domElement) {
    var_dump($domElement->textContent);
}

EDIT:

In this specific case, you should try:

foreach ($crawler->filterXPath('//div/div') as $domElement) {
    var_dump($domElement->textContent);
}

edited Nov 10 '15 at 13:34

answered Nov 10 '15 at 13:16

scoolnico

3,055
14
24

This will output only 1 2 without 3 4 as $crawler->filterXPath('//div')->children(), but however i guess the idea is close to solution. – Tesmen Nov 10 '15 at 13:28

Get first level dom elements by Symfony Crawler

2 Answers2