1

I have this xml:

<Item id="3" idLevel="3">
    <Label qualifier="Usual">
        <LabelText language="ALL">BE01</LabelText>
    </Label>
    <Label qualifier="Usual">
        <LabelText language="EN">R&#xc9;GION DE BRUXELLES-CAPITALE / BRUSSELS HOOFDSTEDELIJK GEWEST</LabelText>
    </Label>
</Item>
<Item id="4" idLevel="3">
    <Label qualifier="Usual">
        <LabelText language="ALL">BE001</LabelText>
    </Label>
    <Label qualifier="Usual">
        <LabelText language="EN">VLAAMS GEWEST</LabelText>
    </Label>
</Item>
<Item id="123" idLevel="3">
    <Label qualifier="Usual">
        <LabelText language="ALL">RO001</LabelText>
    </Label>
    <Label qualifier="Usual">
        <LabelText language="EN">MACROREGIUNEA DOI</LabelText>
    </Label>
</Item>

I would like to fetch a value of a <LabelText language="EN"> where the neighbour <LabelText language="ALL"> starts with "BE" and has 3 numbers after.

In this case I would get a value of a second xml element in example: VLAAMS GEWEST

I have an idea how to approach it in uggly way, but I believe there should be more flexible and elegant way to do it:

$crawler = new Crawler();
$crawler->addXmlContent($xml);
$crawler = $crawler->filterXPath('//Item[@idLevel="3"]');

foreach ($crawler as $domElement) {
    // here I check if inside element's neighbour has value of "BE" and three numbers after with regex
}

Is there a way to handle it with DomCrawler instead of iterating all elements and checking each?

1 Answers1

1

You may use a single XPath expression that will get just your required text:

//Item[@idLevel="3"]/Label[string-length(preceding-sibling::Label/LabelText/text()) = 5 and starts-with(preceding-sibling::Label/LabelText/text(), "BE") and number(substring(preceding-sibling::Label/LabelText/text(), 3)) = number(substring(preceding-sibling::Label/LabelText/text(), 3))]/LabelText[@language="EN"]/text()

Breaking it down:

  • //Item[@idLevel="3"] - gets the Item nodes with idLevel attribute with value 3
  • /Label - its Label children that have...
  • [string-length(preceding-sibling::Label/LabelText/text()) = 5 - a sibling Label/LabelText nodes with text length equal to 5...
  • and starts-with(preceding-sibling::Label/LabelText/text(), "BE") - and having text starting with BE
  • and number(substring(preceding-sibling::Label/LabelText/text(), 3)) = number(substring(preceding-sibling::Label/LabelText/text(), 3))] - and the last 3 chars are digits
  • /LabelText[@language="EN"]/text() - get the text of the LabelText node with a language attribute having text EN
Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
  • Briliant! Thank you for including explanation –  May 02 '17 at 08:36
  • btw, where could I find docs to read about these conditions inside filterXPath? I need to expand it a little bit. –  May 02 '17 at 09:01
  • I think there are tons of resources on this. (Un)?fortunately, I had to learn XPath on an ad-hoc basis. I used to consult http://zvon.org ([here is a Zvon XPath tutorial](http://zvon.org/comp/r/tut-XPath_1.html)). However, there are lots of good stuff here, on SO, too. BTW, how do you need to adjust the expression? – Wiktor Stribiżew May 02 '17 at 09:04