1
xidel -se '//strong[@class="n-heading"][1]/text()[1]' 'https://www.anekalogam.co.id/id'

will print out 3 same outputs

15 June 2020 
                     
15 June 2020 
                     
15 June 2020  

so, what should I do in order to choose only 1 of them?

edit:

inside the strong class, the value looks like this:

 15 June 2020 
                     

How to print only the "15 June 2020"?

CuriousNewbie
  • 319
  • 4
  • 13

1 Answers1

1

Let me illustrate why this happens with the following example.

'test.htm':

<html>
  <body>
    <div>
      <span>test1</span>
      <span>test2</span>
      <span>test3</span>
    </div>
    <div>
      <span>test4</span>
    </div>
    <div>
      <span>test5</span>
    </div>
    <div>
      <span>test6</span>
    </div>
  </body>
</html>
xidel -s test.htm -e '//div[1]/span[1]'
test1

xidel -s test.htm -e '//span[1]'
test1
test4
test5
test6

xidel -s test.htm -e '(//span)[1]'
test1

In other words, you have to put the "strong"-node between parentheses:

xidel -s https://www.anekalogam.co.id/id -e '(//strong[@class="n-heading"])[1]/text()[1]'

This isn't needed if you grab the parent-node instead:

xidel -s https://www.anekalogam.co.id/id -e '//p[@class="n-smaller ngc-intro"]/strong/text()[1]'

[Bonus]

You've probably noticed already that the desired text-node spans 2 lines and ends with a &nbsp;. To have xidel return just "15 June 2020":

xidel -s https://www.anekalogam.co.id/id -e '//p[@class="n-smaller ngc-intro"]/strong/normalize-space(substring-before(text(),x:cps(160)))'

- x:cps() is a shorthand for codepoints-to-string() (and string-to-codepoints()) and 160 is the codepoint for a "No-Break Space".
- text()[1] isn't needed, because whenever you feed a sequence to a filter that expects a string, only the first item of that sequence will be used.

Reino
  • 3,203
  • 1
  • 13
  • 21
  • Thank you @Reino, your answer is very complete and clear, I even changed my initial post content. – CuriousNewbie Jun 16 '20 at 02:11
  • @Reino, What if we want to extract not just the first item, but the first two items? E.g., with this your sample `test.htm`, extract the first two `span`s from the first `div`? I tried `xidel -s test.htm -e '//div[1]/span[:2]'` or `xidel -s test.htm -e '//div[1]/span[1:2]'` but it seems xidel doesn't support such slicing... – A S Mar 27 '22 at 03:44
  • 1
    That's not valid XPath/XQuery syntax. Please have look at "[Filter Expressions](https://www.w3.org/TR/xquery-31/#id-filter-expression)" (or maybe you'll even find my [hobby-project-notes](https://github.com/Reino17/xivid/blob/master/xivid_notes.txt#L5-L80) useful). `[1]` is a shorthand for `[position() = 1]`. To filter the first 2 items you'll always to enter in full `[position() = (1,2)]`. So `-e '//div[1]/span[position() = (1,2)]'` should do, or alternatively: `-e '//div[1]/span[position() lt 3]'`, `-e '//div[1]/subsequence(span,1,2)'`, etc. – Reino Mar 27 '22 at 11:27