0

I am currently trying to extract the li strings als element list, but I want to exclude the form element

What I achieved so far is not returning two items because of the a elements in between

//*[@id="quickPromoBucketContent"]//li[descendant::form]/text()

HTML:

<div class="bucket" id="quickPromoBucketContent">
 <div class="content">
  <ul class="qpUL">
  <li>Sparen Sie 5&nbsp;% beim Kauf von <a href="">Wasserdichte Handyhülle 2 Stück</a> wenn Sie 1 oder mehrere Auto Handy Halterung aus dem Angebot von UGREEN GROUP LIMITED UK
    erwerben! Geben Sie den Code 49DFYWAQ an der Kasse ein. <a href="" target="AmazonHelp">Weitere Informationen</a>        (Teilnahmebedingungen)
    <form method="post" action="/gp/item-dispatch">
    </form>
  </li>
  <li>Sparen Sie 5&nbsp;% beim Kauf von <a href="">USB C PD Schnellladekabel</a> wenn Sie 1 oder mehrere Auto Handyhalterung aus dem Angebot von UGREEN GROUP LIMITED UK erwerben!
    Geben Sie den Code 5BWVW4YN an der Kasse ein. <a href="" target="AmazonHelp">Weitere Informationen</a>        (Teilnahmebedingungen)
    <form method="post" action="/gp/item-dispatch">
    </form>
  </li>
  <li><span id="productPromotion_clipped"><span>Aktivieren Sie diesen Coupon</span>, um beim Kauf dieses Produkts bei Amazon.de 10&nbsp;% zu sparen.</span>
  </li><input type="hidden" name="specialOffersHidden" id="specialOffersHidden">
  <li>
    <div class="amabot_widget">
    </div>
  </li>
 </ul>
</div>
</div>
mrvnklm
  • 1,348
  • 2
  • 10
  • 19

1 Answers1

0

Guessing from your sample code you want to query

  • all li elements below and element with the attribute id="quickPromoBucketContent"
  • that have a form descendant
  • get the text of all child/descendant nodes from there

Try it like this:

//*[@id="quickPromoBucketContent"]//li[descendant::form]/descendant-or-self::*/text()

XPath Demo

wp78de
  • 18,207
  • 7
  • 43
  • 71
  • The `descendant-or-self::*` and `descendant::` are a little verbose. You could shorten it to `//*[@id="quickPromoBucketContent"]//li[.//form]//text()` – Daniel Haley Aug 30 '18 at 18:37
  • Also, the OP said "_but I want to exclude the form element_" so I think you'd need to add a predicate to `text()`. Example: `//*[@id="quickPromoBucketContent"]//li[.//form]//text()[not(ancestor::form)]` – Daniel Haley Aug 30 '18 at 18:42
  • @DanielHaley Thanks. Yes, the query can be shortened. However, I interpret the remark about the form elements differently. The OP is not 100% clear in that respect. – wp78de Aug 30 '18 at 18:52
  • Thank you, this is working. @DanielHaley can you explain whats the difference? I forgot: I also want to exclude all elements that have `target="AmazonHelp"` – mrvnklm Sep 03 '18 at 08:45
  • `//*[@id="quickPromoBucketContent"]//li[.//form]//text()[not(ancestor::form) and not(ancestor::a[@target, "AmazonHelp"])]` would be the way, is there also a possibility to exclude everything behind the `` even though its part of the `li` element? – mrvnklm Sep 03 '18 at 08:51