How to change order of exported variable in Xidel?

Question

I am using Xidel to scrape information from webpage and I am stuck on exporting the information in a different order than it is on the page.

Example:

<tr>
<td></td>
<td></td>
<td></td>
<td><a><font><b>{ location:=. }</b></font>{ title:=. }</a></td>
<td>{ dates:=. }</td>
<td></td>
</tr>

This code will export as title, and then subtitle. Is there any way in Xidel to change the order?

MatrixView · Answer 1 · 2014-10-15T06:43:44.517

0

This may be as easy as:

xidel -q page.html -e subtitle:=//h2,title:=//h1

Something like the following (with several "-e" params) would also work, but like the previous code it will first group all subtitles and then all titles on the page, which is probably not what you want...

xidel -q page.html -e "<div><h2>{subtitle:=.}</h2></div>+" -e "<div><h1>{title:=.}</h1></div>+"

AFAIK, in your case there's no ordering feature in Xidel. But what you CAN do is write a script wherein you save the values as env. variables with the xidel --output-format cmd (if Windows) and then (in the right order) echo/process those variables/values.

Dirkk has given a great tip (to not group), with that your line could look something like this:

xidel -q page.html --xquery "for $i in //div return (concat('sub:=',$i/h2), concat('title:=',$i/h1))"

edited Oct 15 '14 at 06:43

answered Oct 14 '14 at 06:44

MatrixView

311
2
7

Thanks for your answer! The actual page is more complicated than what I posted. For example: xidel page.html -e "{ location:=. }{ title:=. }{ dates:=. }+" But this prints in the order the variables are in the page. How would I change the order? Any idea? – Jirka Matousek Oct 14 '14 at 09:13
Thanks @MatrixView I will look more into the saving/echoing the variables. Sounds like a viable option! – Jirka Matousek Oct 14 '14 at 13:46

score 0 · Answer 2 · answered Oct 14 '14 at 17:43

0

I have never used this tool, but given a quick look at the documentation and seeing that it supports XQuery, the following should work I guess:

xidel -q page.html --xquery "for $div in //div return ($div/h2, $div/h1)" --output-format xml

This assumes you have several such div elements in your page and want to sort all your titles with a subtitle first individually, i.e. not all subtitles first. Also, as you not have given a more specific example XML, it simply selects all divs and iterates over them - In real world HTML you probably want more characteristic features (like id attributes).

answered Oct 14 '14 at 17:43

dirkk

6,160
5
33
51

Thank you! I updated the code to show a better example of what I am trying to solve. How would I evaluate that in XQuery? – Jirka Matousek Oct 15 '14 at 11:08
First of all, if you edit you can and should directly modify the question, there is no need for an Update section. Your XML is basically still the same. You would select all table rows by using `//tr`, but if you have other tables in this page, these would be selected as well. Use same uniquely identifying elements of a webpage, e.g. an idea, or an h1 or h2 header, some characteristic link... – dirkk Oct 15 '14 at 12:14

How to change order of exported variable in Xidel?

2 Answers2