2

I'm trying to use XPath 1.0 to select all of the text within these li elements, except for the last one with class="detailCrumb".

I'd like my result to look like:

Home Photography Memory Cards & Accessories Memory Cards

These breadcrumbs will be dynamic based on the level of the site I'm at, so I can't specify positional requests, such as li[4]. How can I achieve this?

<div class="breadcrums-cont">
    <ul id="breadcrumbs" ">
        <li class="first">Home</li>
        <li>Photography</li>
        <li>Memory Cards &amp; Accessories</li> 
        <li>Memory Cards</li>
        <li class="detailCrumb">SanDisk Extreme Pro</li>    
    </ul>
</div>
kjhughes
  • 106,133
  • 27
  • 181
  • 240
tsb8m
  • 75
  • 7

2 Answers2

3

To exept li tags with class "detailCrumb", use not() function

//ul[@id="breadcrumbs"]/li[not(@class="detailCrumb")]
splash58
  • 26,043
  • 3
  • 22
  • 34
  • I tried this, but it only returns the first li (Home). – tsb8m Dec 08 '17 at 17:13
  • 1
    It returns node list. If you use xpath 1.0, you can concatenate them only by another instrument. for example, using your programming language – splash58 Dec 08 '17 at 17:16
  • I'm using a web scraping tool that uses XPath 1.0, so I don't have have the ability to edit the underlying code I'm scraping from. – tsb8m Dec 08 '17 at 17:21
  • i'm sure in Xpath, but know nothing about web scraping tool :( – splash58 Dec 08 '17 at 17:27
1

XPath 1.0

The usual answer is that you can concatenate a fixed number of items:

concat(//ul[@id="breadcrumbs"]/li[not(@class="detailCrumb")][1], ' ',
       //ul[@id="breadcrumbs"]/li[not(@class="detailCrumb")][2], ' ',
       //ul[@id="breadcrumbs"]/li[not(@class="detailCrumb")][3])

To concatenate a variable number of items, as @splash58 has said (+1), you'll have to use string concatenation facilities of the language calling XPath – XPath 1.0 alone cannot do it.

However, if you start with the string value of the entire list and then take away the unwanted string,

normalize-space(
   substring-before(//ul[@id="breadcrumbs"],
                    //ul[@id="breadcrumbs"]/li[@class="detailCrumb"]))

then you can achieve your requested result:

Home Photography Memory Cards & Accessories Memory Cards

XPath 2.0

You can join a variable number of items in XPath via string-join():

string-join(//ul[@id="breadcrumbs"]/li[not(@class="detailCrumb")], ' ')

returns

Home Photography Memory Cards & Accessories Memory Cards 

as requested.

kjhughes
  • 106,133
  • 27
  • 181
  • 240